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THE DIMENSIONALITY OF UNION-MANAGEMENT 
RELATIONS AT THE LOCAL LEVEL 


ROSS STAGNER 
Wayne State University 


MILTON DERBER anv W. ELLISON CHALMERS 


University of Illinois 


In developing systematic descriptions of 
any particular class of natural phenomena, 
two contrasting approaches have been widely 
employed. We may identify these briefly as 
the typological and the dimensional ap- 
proaches. One may, to bring order into a 
mass of observations, cluster cases into types; 
or he may arange them according to quanti- 
fied dimensions and study the relationships 
among these dimensions.’ 

Research on union-management relations 
has made some use of types (cf. Harbison 
and Coleman, 1951; Selekman, 1949). The 
types described were based upon observation 
by experts, but in neither case were precise 
criteria laid down by which establishments 
could be assigned to a particular category. 
In the Illini City studies (University of IIli- 
nois: 1953, 1954) some experimentation with 
types based on a more precise set of opera- 
tions was reported. Following up the Illini 
City work, the authors have recently (1957) 
reported on a set of empirical clusters deter- 
mined in a_ purely objective manner by 
reference to quantitative scores for the estab- 
lishments concerned. The data indicated that 
meaningful types could be isolated in this 
fashion, and that these types were associated 
with environmental variables (Derber, Chal- 
mers, & Stagner, 1958) in logically consistent 


Milton Edelman of Southern Illinois University, and 
to former graduate assistants, Herbert Schaffer, Rob- 
ert Ver Nooy, Robert Mitchell, Sheldon Luskin and 
John Tipton, who aided in various phases of the 
collection and analysis of data. 


It is, however, possible to analyze the same 
data in a strictly dimensional manner. The 
technique of factor analysis permits us to 
break down a pattern of relationships among 
data into independent underlying dimensions. 
This may lead to the combining of several 
variables into a single dimension, or it may 
indicate that a particular raw score is related 
to more than one independent dimension. 

The purpose of the present investigation is 
to apply the factor-analytic method to the 
same data used in the typological investiga- 
tion, in the hope of reaching some conclusion 
as to the relative fruitfulness of the two 
approaches. 

Population and variables studied. The 
analysis is based upon 41 establishments in 
three downstate Illinois communities, ranging 
in size from 73 to 2100 hourly employees, in- 
cluding utilities, service and manufacturing 
enterprises (the latter involving both producer 
and consumer goods). Data were collected 
by intensive interviews with the two top union 
officials and the two top management persons 
in labor relations.” 

The present article is based on statistical 
treatment of 35 variables. Twenty of these 
were scored by consensus of the responses 
given by the four respondents.’ In the case 


2A more detailed report of these interview pro- 
cedures and a more extensive definition of the proc- 
ess variables (1-20) have been published elsewhere 
(Derber et al., 1957). 

8 Consensus was operationally defined as agreement 
by at least three out of four respondents on a certain 
point. In the case of “factual” items,. one spokes- 
man from each side was reinterviewed if consensus 
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of attitudes and satisfactions, it was neces- 
sary to compute separate management and 
union scores for obvious reasons. Three were 
purely objective. A complete list of the vari- 
ables with brief identification is as follows: 


1. Autonomy—contract: freedom of local 
union and management to work out con- 
tract terms. 

. Autonomy—grievance: 
ance settlements. 

. Autogeny—contract: reaching a contract 
without mediators, arbitrators, etc. 

. Autogeny—grievance: same for grievance 
settlements. 

. Influence—scope: variety of issues on 
which union has a voice by contract or 
practice. 

. Influence—depth: depth of penetration of 
union on certain topics such as security, 
seniority, and degree of participation in 
decisions on safety rules, discipline, tech- 
nological changes, and time study. 

. Pressure—contract: strikes, lockouts or 
threats thereof, during negotiations as 
pressure devices. 

. Pressure—grievance: same for grievance 
discussions. 

. Yielding: concessions made by manage- 
ment to avoid trouble. 

. Legalism—grievance: reliance on techni- 
calities of contract in resolving grievances. 

. Past practice—contract: appeals to past 
practice during negotiations. 

. Past practice—grievance: same for griev- 
ance discussions. 

. Initiative: degree to which management 
initiates proposals on contract or griev- 
ance level. 

14. Speed: speed of settlement of typical 
grievance. 

15. Emotional tone—contract: reported 
friendliness and trust or hostility and 
suspicion in negotiations. 

16. Emotional tone—grievance: 
grievances. 

17. Understanding—contract: degree to which 
the parties understand the intentions of 
each other. 


same for griev- 


same _ for 


was not obtained at first. Or ‘sub‘ective” items such 
as satisfaction and attitude, the numerical values of 
the answers chosen were simply averaged to get the 
company or union position. 


18. Understanding—grievance: 
grievance discussions. 

19. Concession—contract: taking into account 
the goals and problems of the other side 
during negotiations. 

20. Concession—grievance: same for griev- 
ances. 

21. Attitude—management: a scaled score 
for management on approval-disapproval 
of the union. 

22. Attitude—union: same for union approval 
of the management. 

23. Satisfaction—scope, M: Management’s 
satisfaction with scope of union influence 
(Variable 5). 

24. Satisfaction—scope, U: Union’s satisfac- 
with scope (Variable 5). 

25. Satisfaction—grievance, M: Manage- 
ment’s satisfaction with the procedure 
and manner of grievance handling. 

26. Satisfaction—grievance, U: same for 
union. 

27. Satisfaction—depth, M: Management’s 
satisfaction with depth of union influence 
(Variable 6). 

28. Satisfaction—depth, U: same for union. 

29. Satisfaction—contract manner, M: Man- 
agement’s satisfaction with the mode of 
negotiating contract (cf. Variables 7, 11, 
13, 15). 

. Satisfaction—contract manner, U: same 
for union. 

. Satisfaction—wages and fringes, M: Man- 
agement’s satisfaction with wage and 
fringe benefits being paid. 

. Satisfaction—wages and fringes, U: same 
for union. 

. Hourly earnings: a figure representing 
the average wage for hourly rated workers 
only. 

. Size: number of hourly rated workers. 

. Skill ratio: estimated proportion of labor 
force on jobs requiring at least one year 
of training. 


same for 


Procedure and Results 


The 41 establishments were ranked on each of the 
35 variables listed aove, and rank-difference correla- 
tions computed for the 35 X 35 matrix. The cor- 


4 More detailed data on raw scores, reliability of 
measures and the statistical properties of the distri- 
butions will be found in a forthcoming monograph, 
“The Local Union-Management Relationship; a com- 
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Table 1 
Rotated Factor Matrix for 35 Variables 








Factors 





Variables 


5 6 





Autonomy—cont. 
Autonomy—grv. 
Autogeny—cont. 
Autogeny—grv. 
Influence—scope 
Influence—depth 
Press—cont. 
Press—grv. 
Yielding 
Legal—grv. 

Past Prac. Con. 
Past Prac. Grv. 
Mg. Init. Con. 
Speed Grv. 
Emot. Tone Con. 
Emot. Tone Grv. 
Underst. Con. 
Underst. Grv. 
Concede Con. 
Concede Grv. 
Att. Mgt. 

Att. Union 

Sat. Scope M. 
Sat. Scope Un. 
Sat. Grv. Mgt. 
Sat. Grv. Union 
Sat. Depth M. 
Sat. Depth Un. 
Sat. Cont. Man. M. 
Sat. Cont. Man. U 
Sat. W&F Met. 
Sat. W&F Un. 
Hourly Earnings 
Size 

Skill Ratio 


+18 
—00 


—10 


—03 
—34 
—36 
+13 
+36 
—07 
+14 
+22 
+39 
+49 
+06 
+02 
—20 
—05 
+14 
+35 
—03 
+05 
+43 
+71 
—19 
+12 
+24 
+85 
—19 
+42 
+10 
+15 
+05 


—13 
+16 
+41 
—12 
—08 
+43 
+10 
+63 
+71 
—01 
+26 
+07 
—05 
+31 
+11 
+02 
+06 
—06 
+03 
—18 
+06 
+12 
+05 
—08 


—02 
+19 
—26 
—15 
—10 
—06 
—01 
—03 
+49 
—03 
+83 
+12 
+06 


+34 
+34 
+01 
—22 
+10 
—01 
—i1 
+45 
+06 
+15 
—33 
—40 
+27 
—65 
+29 
—il1 
—01 
+09 
+14 
—16 
+02 
—13 
—21 
+09 
— 16 
—10 
—04 
— 38 
+11 
+09 
+07 
+12 
+20 
—16 +77 
+22 —12 


+54 
+29 
—04 
+06 
—26 
+29 
—06 
+01 
—33 
+32 
+17 
—01 
—24 
—03 
—47 
—01 
—15 
—00 
—40 
— 04 
—21 
—14 
+26 
+14 
+07 
—14 
—07 
+18 
—46 
+16 
—42 
+08 
+07 
—18 
+82 


—03 
+04 
—21 
+07 
+10 
+14 
+00 
+13 
—23 
—16 
—09 
+04 
—03 
+20 
+13 
+31 
—15 
+25 
+21 
—02 
+38 
—19 
+77 
—19 
+25 
+21 
+72 
—18 
+07 
—14 
+62 
+05 


—06 
—06 
+13 
+60 
+43 
+75 
—04 
+01 
—01 82 
+41 72 
—06 84 
—03 78 
+13 78 
+06 74 
+20 82 
+11 82 
+02 83 
+04 78 
+13 74 
—16 66 
—02 81 
—01 73 
—03 77 


—02 
+12 
+42 
—10 
+20 
+15 
+02 
+01 
+03 


—19 
+05 
+04 
+16 
+14 





relation matrix was then factor-analyzed by the 
principal axes method, and the 10 factors chosen 
which accounted for most of the common variance. 
An orthogonal rotation was applied, using a ma- 
chine method which gives a good approximation to 
simple structure.5 


parative analysis of dimensions and types,’’ to be 
published by the University of Illinois Press. Rank- 
difference coefficients were used in this analysis be- 
cause of skewing of some distributions. Copies of 
the table of intercorrelations can be obtained by 
writing to Milton Derber. 

5 We wish to express our appreciation to Kern 
Dickman and other members of the Computer Labo- 
ratory staff at the University of Illinois for assistance 
in setting up and running the data through’ Iliac, 
the electronic computer. 


The results of this treatment are shown in Table 
1. The factors are arranged in order of size, ic., 
proportion of the common variance accounted for. 
Thus Factor 1 accounts for about 15% of the vari- 
ance which is involved in these 10 factors, and this 
proportion decreases to Factor 10, which accounts 
for only about 6%. The only important interpreta- 
tion which can be derived from this fact is that no 
single dimension (or set of two or three dimensions) 
accounts for most of the differences among our 41 
establishments. Most of the factors derived from 
the analysis contribute enough to the obtained dif- 
ferences to merit retention, and, as will be shown 
below, they fall into quite meaningful patterns. It 
would appear that a theory of union-management 
relations which stresses economic benefits, or atti- 





4 R. Stagner, M. Derber, and W. E. Chalmers 


tudes, bureaucracy, or mutual understanding, will be 
too simple. Many dimensions must be taken into 
consideration in a satisfactory picture of the rela- 
tionship. 

Factor 1: Management satisfaction. Some support 
for those who have stressed the attitudinal compo- 
nent as the defining characteristic of local relation- 
ships is found in Factor 1. This factor coincides 
rather closely with the scale scores for management 
approval of the union, and for management satis- 
faction with scope and depth of union influence. 
The three other managerial satisfaction scores are 
also represented here. It is not surprising that re- 
sort to pressure tactics in both contract negotiations 
and grievance settlement is relatively rare (negative 
loadings) in establishments which are high on Fac- 
tor 1; this lack of pressure may play a causal role, 
of course, in determining the favorable management 
attitude. (Our data do not permit a causal interpre- 
tation, since they are purely cross-sectional.) It is 
likewise appropriate that establishments high on this 
factor should report pleasant emotional tone in both 
contract and grievance discussions. 

Factor 2: Local settlement of disputes. The next 
most important factor seems to be one of local 
settlement of disputes. The highest loading is on the 
use of arbitrators on grievances (which we have re- 
ferred to as an element in autogeny) and autonomy 
in grievance settlement is very close. Willingness to 
make concessions in contract and grievance settle- 
ments and understanding of basic intentions in nego- 
tiations are also highly loaded on this factor. These 
may well contribute to the process of reaching a 
settlement locally. 

Establishments high on this factor are also faster 
in grievance speed, and management is more satis- 
fied with grievance process. These establishments 
report less pressure in negotiations, less yielding to 
pressure, and less legalism in settlements. 

Factor 3: Union satisfaction with relations. Man- 
agerial satisfaction proved to be a single dimension 
with relatively little confusion as to its implications 
(Factor 1). Union satisfaction scores, however, prove 
to be clearly separable into two components, which 
appear to be, respectively, satisfaction with the gen- 
eral interaction—perhaps the interpersonal relations 
—with management, and satisfaction with union 
achievements. Once such a finding appears, its logic 
becomes obvious. Union officers may be quite satis- 
fied on one of these categories, and little or not at 
all content with the other. 

High loadings here are on union satisfaction with 
contract manner, union satisfaction with the griev- 
ance procedure, and to lesser extent, union satisfaction 
with wages and benefits, and union attitude. Sur- 
prisingly enough, management satisfaction on. griev- 
ances also has a sizeable loading here. It suggests 
that the union perception of management, as re- 
flected in this kind of factor, favors a relatively 
peaceful grievance procedure, even if conflicts de- 
velop at negotiation time. 

Factor 4: Union achievement. This factor may 
represent the achievements of the local union, or it 


may be confounded with the effects of being part of 
a given industry. It is almost identical with rank 
on hourly earnings (loading of +.83); however, the 
very high loadings with union influence scope and 
depth (both +.67) and the substantial emphasis on 
managerial satisfaction with wage level and fringe 
benefits indicate that both general union strength 
and an industry wage pattern, may be involved. 
The norms of both parties as to acceptable levels of 
wages and of union influence will, of course, be 
affected by the pattern in the industry, as well as 
by community and other external variables. 

Factor 5: Bargaining style. While this factor at 
first glance seems lacking in unity, there seems to be 
a plausible interpretation. Establishments high on 
this factor would appear to be those in which the 
bargaining process is quite active, with management 
making many proposals to the union (+.77), with 
considerable pressure and threats (+.48 and +.40), 
with a fair amount of yielding in response to pres- 
sure (+.50). A tendency to appeal to past practices 
rather than to rely on legalisms seems to be asso- 
ciated with this fluid bargaining situation. 

Factor 6: Skill of work force. This factor is most 
closely related to the skill ratio for the production 
workers. It would suggest an industry rather than 
a local determinant. Establishments high on the fac- 
tor have considerable autonomy on contracts (+ .54). 
Evidently high-skill establishments have considerable 
friction when negotiating contracts; the loadings on 
emotional tone, on conceding in negotiations, and on 
managerial satisfaction are negative and fairly large. 
Union satisfaction has no apparent relation to this 
factor. 

Factor 7: Union satisfaction with achievement. As 
noted above, union satisfaction splits into two inde- 
pendent dimensions. The union officers may be 
satisfied with the economic benefits and contract 
privileges won for their members, even if dissatis- 
fied with the way the relationship functions from 
day to day (since these are orthogonal factors, they 
vary independently). Factor 7 is highly loaded with 
the satisfaction felt by union officers regarding scope 
and depth of influence, as well as satisfaction with 
wages and fringes. The loading on this latter vari- 
able is considerably larger than for Factor 3. Union 
attitude to management, however, is about equally 
related to Factor 3 and Factor 7; this may indicate 
that generalized approval derives from both varieties 
of satisfaction. 

Factor 8: Size. The two principal loadings here 
are on objective size and on speed of grievance settle- 
ment (negative), which may well be dependent on 
size of establishment. The correlated variables (au- 
tonomy, pressure regarding grievances, failure to rely 
on past practice) are also of such a nature as to be 
readily affected by size. However, this may not be 
the best label for the factor. 

Factor 9: Legalism. This factor is the most diffi- 
cult to name. It has only one heavy loading, repre- 
senting the successful conclusion of contract argu- 
ments without mediation. The next largest loadings 
are on legalism in grievance handling (positive), 
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avoidance of past practice as a guide; and we have 
speculated that an attitude favoring the orderly 
handling of disagreements, abiding by contracts, etc., 
might underlie this factor. 

Factor 10: Effective grievance handling. This fac- 
tor is characterized by two high loadings on vari- 
ables related to grievances. Reported understanding 
of the other party’s intentions and favorable emo- 
tional tone in grievance handling seem to define this 
dimension as one of effective grievance procedure. 
It is positively related to union attitude, but not to 
management’s attitude, which is plausible since un- 
ion officials are necessarily much more ego-involved 
in grievances than are managers. Somewhat puzzling 
is the tendency for high use of pressure during con- 
tract negotiations to go with the favorable grievance 
situation. 


Discussion 


Before considering the question of types 
versus dimensions, let us note certain general 
aspects of the findings. First of all, it seems 
clear that factor analysis, a method for identi- 
fying one or more independent linear dimen- 
sions which best describe several measured 
variables, produces meaningful results when 
applied to union-management relations. Stu- 
dents familiar with this area of study will find 
that most of the 10 factors pick up aspects 
which have been mentioned in the literature 
as important in one connection or another. 
The factored data, however, present certain 
advantages as compared with the direct ob- 
servation: (a) We find that certain variables 
which appear discrete are not independent 
and can appropriately be combined. (0) Cer- 
tain variables which appear to belong to- 
gether are shown to be independent. (c) The 
extent to which an establishment’s “score” on 
a dimension is related to each of several vari- 
ables can be determined quantitatively. 

Another interesting point relates to the 
variables, size, hourly earnings and skill ratio, 
thrown into the analysis. These three vari- 
ables were not determined by responses to 
questions about the relationship; they are, 
rather, independent of the process variables 
which were our chief interest. Each of these 
independent variables turned up with a very 
large loading in a single factor, an outcome 
by no means inevitable in terms of the 
method. Furthermore, not one of these vari- 
ables showed a substantial loading on any 
other but its own factor; their effects, in 


other words, were relatively unambiguous.°® 
It seems certain that the psychological vari- 
ables such as attitude and satisfaction will al- 
ways have to be interpreted within a context 
influenced by these objective elements. 

It would be tempting to interpret the po- 
sition of Factor 1 (Management Satisfac- 
tion), accounting for more variance than any 
other factor, in terms of the decisive effect of 
management attitudes upon the relationship. 
This would in part be spurious, however, since 
we have loaded our investigation with vari- 
ables of an interpersonal character—under- 
standing, conceding, yielding, emotional tone, 
satisfaction. Within this group, managerial 
attitudes and satisfactions may well be the 
most significant. Had a different combina- 
tion of variables been used, some other fac- 
tor might have the largest loadings. But it 
seems safe to conclude that management atti- 
tude and satisfaction constitute a major di- 
mension of the relationship. 

Finding two dimensions instead of one for 
union satisfaction seems to us important. 
Psychologists are prone to stress perception 
of the relationship as a unit, and to predict 
that satisfaction will be likewise unitary. It 
is clear, at least in these data, that satisfac- 
tion on the part of union officers must be 
treated as two dimensions, one relating to 
union contractual accomplishments and the 
other to the daily interactions with manage- 
ment. This observation clears up some con- 
fusions and can forestall others. 

Similarly, it is worthwhile noting that Fac- 
tors 4, 6, and 8 seem each to be related to 
“industry” characteristics, but in our sta- 
tistics they behave independently. It would 
have been easy to think of “industry” as a 
single variable, but it appears in these data 
that at least two are involved: an industry 
characteristic relating to earnings and union 
influence, and another involving skill ratio. 

Finally, it should be emphasized that we 
do not allege that we have isolated the 10 


6 Students of factor-analytic methods may also be 
interested in the observation that these three vari- 
ables had 10 loadings above .30 in the unrotated 
matrix of 10 factors; but after Quartimax rotation, 
there were only 3 such loadings, each of them much 


larger. This supports the view that rotation leads 
to more meaningful results than the original factor 
pattern, 
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fundamental dimensions of the local labor- 
management relationship. It is quite possible 
that further studies will show that Factors 5, 
8, and 9 are inadequately identified, and that 
Factors 6 and 10 include more than we have 
indicated. The addition of more variables 
may clarify the significance, in terms of in- 
dependent determinants or of dependent con- 
sequences, of all the dimensions. However, 
this analysis certainly gives us a good pic- 
ture of several dimensions which are not 
likely to be discarded by further research, 
particularly 1, 2, 3, 4, and 7. 

Types vs. dimensions. Let us now com- 
ment briefly on the general problem ‘of de- 
veloping “types” of union-management rela- 
tions, as compared with efforts to identify 
underlying dimensions. 
in our previous article (1957) that empirical 
types can be identified and that they show 
meaningful. differences in variables not used 
to define the type. The types employed were 
based on three variables: union influence 


(scope plus depth), pressure (contract pres- 
sure, grievance pressure, and yielding to pres- 
sure) and attitudes (management and union). 
Thus, 7 of the 35 variables studied factorially 


were used to define the types. The four main 
types or clusters located were characterized as 
follows: union-dominated, management-hos- 
tile; management-dominated, union-acquies- 
cent; management-dominated, union-hostile; 
and high union influence, mutually friendly. 
An examination of the data on the other 
variables led to the conclusion that “the value 
of a factor may be significantly different when 
associated with one cluster than with an- 
other.” For example, union influence is high 
in both Cluster A and Cluster E, but the 
former includes pressure and hostility, the 
latter including neither. Both show high re- 
liance on past practice in grievance settle- 
ment, but, it seems clear, for different rea- 
sons. Or, to take Clusters B-C and E: emo- 
tional tone is high (favorable) in both, but 
Cluster B-C is composed of weak unions ap- 
parently grateful to a kindly management, 
while Cluster E includes strong unions which 
apear friendly but not submissive. Thus the 
typological analysis seems valuable in that it 
calls attention to the varying significance of 


It was demonstrated ; 


a specific variable when it is set in the con- 
text of one or another type of relationship. 

Against this it must be observed that the 
dimensional analysis reported here also brings 
out hidden significances in the raw data. We 
have noted the separation of union satisfac- 
tion into two components. The attitude of 
union officials toward management is posi- 
tively correlated with both of these factors; 
but obviously this says that favorable union 
attitude may be related to high achievements 
(as in Cluster A) or to good treatment (as in 
Cluster B-C). Furthermore, we find that fa- 
vorable union attitude is also related to Fac- 
tors 9 and 10; Factor 10 seems related to 
Cluster E, but Factor 9 has no counterpart 
in the type data. Hence, it is quite ‘possible 
that we may get farther by fractionating a 
complex component like union attitude by 
the technique of dimensional analysis than 
we can simply by studying its place in em- 
pirically chosen clusters. 

Emotional tone in contract negotiations is 
positively loaded on Factors 1 and 3 (Satis- 
faction of Management and of Union) but 
negatively on Factor 6, an industry factor. 
This finding indicates that we should not 
treat emotional tone as a single, unambiguous 
fact, that it has different meanings in the con- 
text of establishments high on 6 as against 
those high on 1 or 3. 

In other words, multivariate analysis 
(whether by factor analysis, as in the present 
study, or by typology, as in our earlier ar- 
ticle) seems necessary for a full exploitation 
of the data. Univariate analysis, as in com- 
parison of establishments for frequency of 
strikes or for attitudes of conflict, is inade- 
quate. The meaning of a single score varies 
substantially according to the context, the 
kind of union-management relationship within 
which it occurs. 

For these reasons, we cannot conclude that 
dimensional analysis based on factors is defi- 
nitely superior to typing, or vice versa. Both 
of these multivariate techniques, however, 
seem superior to univariate methods. 


Summary 


1. Forty-one establishments were ranked 
on 35 variables (aspects of the union-man- 
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agement relationship), rank-difference corre- 
lations computed, and the matrix factor-ana- 
lyzed. 

2. Ten dimensions accounted for most of 
the common variance. Brief interpretations 
were offered for these 10. In some cases, an 
assumed single variable (such as union satis- 
faction) was found to be a complex function 
of several underlying dimensions. 

3. Three variables, size, hourly earnings, 
and skill ratio, determined three independent 
factors, despite the fact that they were out- 
numbered by the variables descriptive of the 
relationship. 

4. A comparison of typological analysis 
with a dimensional approach based on fac- 
tor analysis indicates that each increases the 
amount of information above that derived 
from a univariate analysis. Single variables 
receive differing interpretations according to 
the context (type or factor structure) within 
which they occur. Multivariate analysis is 


judged superior to univariate analysis for this 


kind of study, but it is not feasible to state 
that typological or factorial technique is defi- 
nitely superior. It seems possible that both 
techniques could profitably be utilized in a 
single research design. 


Received May 6, 1958. 
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Since the development of the Scale of Mani- 
fest Anxiety by Taylor (1951) this instru- 
ment has been employed in two forms (Tay- 
lor, 1953). One form consists of only the 50 
items, all of which contribute to the anxiety 
scale. The other form consists of a 225-item 
scale which, in addition to the 50 anxiety 
items, includes a number of buffer items. 

Many of the recent studies concerned with 
anxiety have used the shortened scale as the 
measure of anxiety because of the correlation 
found to exist between the two forms. This 
correlation ranges from .68 as reported by 
Taylor (1953) to .95 reported by McCreary 
and Bendig (1954). 

Many of the recent studies employing this 
short form as a measure of anxiety have also 
used college students as subjects. It was felt 
that the scale, particularly when used with 
college students, might be so transparent that 
its validity would depend on the motivations 
of the subject. (If this were true, the results 
of recent studies employing this scale may 
need to be re-evaluated.) There are two as- 
pects of the question. One is: can students 
alter their scores in a desired manner when 
so directed? The other is, will students alter 
their scores in other situations when not spe- 
cifically directed to do so? 

The purpose of this study was to investi- 
gate only the former question of the trans- 
parency of the short form of the Taylor Scale 
of Manifest Anxiety in a college population. 


Procedure 


A total of 190 undergraduate students in an intro- 
ductory psychology course at Iowa State College 
were used as Ss. These Ss were randomly assigned 
to five equal groups of 45. As a result of absentees 
on the days the tests were administered, the original 
groups (m=45) were reduced in size. Since the 
smallest group had 38 Ss, individuals were randomly 


eliminated from the other groups until all groups had 
38 Ss. This equalization of groups was done for 
convenience in statistical analyses. Three weeks after 
the beginning of the course, the A-scale was adminis- 
tered to the entire group. The only differential treat- 
ment among the groups was in the printed instruc- 
tions attached to the front of their test booklets. 

The students were orally cautioned to read these 
instructions carefully before turning to the scale 
proper. During the first administration, three of the 
groups had the usual instructions given when the 
scale is administered under normal conditions (called 
standard administration hereafter). A fourth group 
had instructions to respond as if they were very 
poorly adjusted persons. The fifth group had in- 
structions to respond as if they were very well-ad- 
justed persons. A month later the scale was read- 
ministered to the same groups under the same con- 
ditions as the first administration except that the in- 
structions were altered for all but Group 1, or the 
control group as indicated in Table 1. Both ad- 
ministrations were completed prior to any class dis- 
cussion of anxiety. 


Statistical Method 


In order to analyze the effect the differential 
instructions had upon the various group per- 
formances several analyses of variance were 
computed as indicated in Table 3. 

Pearson product-moment correlations were 
computed between the first and second ad- 
ministrations of the five groups. 


Table 1 


Group Arrangement and Differential Instructions 


Instructions 
on the Second 
Administration 


Instructions 
on the First 


Group Administration 





Standard 

Poorly Adjusted 
Well Adjusted 
Standard 
Standard 


Standard 
Standard 
Standard 
Poorly Adjusted 
Well Adjusted 
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Table 2 


Means, Variances, and Standard Deviations of Groups for Both Administrations 


First Second 
Adminis- Adminis- Mean Significance 
Groups tration tration Difference Level 





-n= 38 16.55 14.61 1.94 <.05 
(Standard-Standard) 36.09 45.81 


6.01 6.77 


n= 38 13.13 39.79 
(Standard—Poorly Adjusted) 37.58 154.39 
6.13 12.43 


.n = 38 14.55 6.68 
(Standard—Well Adjusted) 17.19 


4.14 


n= 38 10.55 
(Poorly Adjusted—Standard) 38.47 


6.20 


5. n = 38 2. 16.05 
(Well Adjusted-Standard) . 70.92 


8.42 


Summary of All Analyses of Variance Computed 


Difference Between 


First Administration Second Administration 


Group 1 (Standard) and Group 1 (Standard) 
Group 4 (Poorly Adjusted) and Group 2 (Poorly Adjusted) 
Group 4 (Well Adjusted) and Group 3 (Well Adjusted) 16.79 


All 5 Groups -— ai 105.49 


Groups 1, 2 and 3 
(All Standard) 2.13 
Groups 1, 2 and 3 
(All Standard) 
and 
Groups 4 and 5 (Poorly and 
Well Adjusted) 
Group 4 (Poorly Adjusted) 
and 
Group 5 (Well Adjusted) 289.31 
er Groups 1, 4 and 5 
(All Standard) 6.03 
Group 1 (Standard) and 
Groups 4 and 5 (Standard) 


Group 4 (Standard) and 
Group 5 (Standard) 
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Table 4 


Correlations Between the Two Administrations 
of the Various Groups 


Group (Instructions) Correlation 





0.72 
—0.04 
—0.04 

0.03 

0.63 


. (Standard-Standard) 

. (Standard—Poorly Adjusted) 

. (Standard—Well Adjusted) 

. (Poorly Adjusted—Standard) 
5. (Well Adjusted—Standard) 


Results 


1. Group means, variances, standard devia- 
tions. A summary table of the means, vari- 
ances, and standard deviations for the first 
and second administrations of the five groups 
is shown in Table 2. As indicated in Table 2, 
the mean of the control group (Group 1) de- 
creased 1.94 from the first to the second ad- 
ministration. The means of the remaining 
four groups all changed in the expected di- 
rection. 

2. Analyses of variance between groups. A 
summary of all the analyses of variances 
which were computed is shown in Table 3. 

3. Coefficients of correlation between both 
administrations of all groups. Pearson prod- 
uct-moment correlation coefficients computed 
between the first and second administration of 
the five groups are shown in Table 4. 

Of the correlations shown in Table 4, only 
two were significantly different from zero. 
One of these was the correlation between the 
two administrations to Group 1, which re- 
ceived standard instructions on both adminis- 
trations. 


Discussion 


The differential administrative instructions 
employed in this study had a very definite 
effect upon test scores. The mean score for 
these groups of college students on the short 
form of the A Scale under standard adminis- 
trative instructions ranged from 10-16. When 
the Ss were instructed to appear poorly ad- 
justed, the mean scores rose to about 40. 
Both groups which received instructions to 
appear well adjusted had means which were 
less than the means of any of the groups 


which received the standard instructions on 
the first administration. 

The groups which received instructions to 
appear poorly adjusted were able to alter their 
scores much more than the groups which re- 
ceived instructions to appear well adjusted. 
There are several possible explanations for 
this greater shift. First, since the means of 
the standard groups were much closer to zero 
(minimum anxiety) than they were to 50 
(maximum anxiety), there was simply less 
room for the groups receiving instructions to 
appear well adjusted to alter their scores 
downward. A second possible explanation 
might be that the Ss on the standard ad- 
ministrations were already making some at- 
tempt to appear well adjusted. There is also 
the possibility that the Ss on the stand- 
ard administrations were, in truth, unusually 
well adjusted. The writers assumed that this 
was not the case. This lack of a significant 
difference between the groups which received 
the standard instructions and the groups 
which received instructions to appear well ad- — 
justed on the first adniinistration seems to in- 
dicate that even those Ss who received stand- 
ard instructions were attempting to appear 
favorably. The finding that the group which 
received instructions to appear well adjusted 
on the first administration (Group 5) did not 
appear as well adjusted as the group which 
received the same instructions on the second 
administration (Group 3) may be attributable 
to prior experience with the A Scale. Group 3 
had, at the time of the second administration, 
already taken the A Scale once with standard 
instructions. As a result of this prior experi- 
ence, they may have remembered some of the 
responses they had made on the standard ad- 
ministration and altered them in the well-ad- 
justed direction. Also, as was indicated in 
the control group (Group 1) which received 
standard instructions on both administrations, 
there appeared to be a significant decrease in 
scores from the first administration to the 
second. This decrease may be a result of de- 
creased self-criticism. 

There was no significant difference between 
the two groups which received instructions to 
appear poorly adjusted. It would seem, there- 
fore, that the concept of poor adjustment is 
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more clearly conceived than that of good ad- 
justment. 

The score an S made on the A Scale on the 
second administration was not a function of 
his score on the first administration except for 
Groups 1 (standard-standard) and 5 (well 
adjusted-standard). For Group 1, a high cor- 
relation between administrations would be ex- 
pected since adequate test-retest reliability for 
the A Scale has been demonstrated. The cor- 
relation in this case was .73. Why Group 5 
(well adjusted-standard) showed such a high 
correlation (.63) between administrations is 
not so apparent. This correlation may simply 
be spurious due to sampling error. It has, 
though, been suggested earlier that an S tak- 
ing the A Scale with standard instructions 
might make an attempt to appear better ad- 
justed than he, in truth, actually is. Since 
Group 5 on the first administration had re- 
ceived instructions to appear well adjusted, 
there may be a systematic perseveration from 
this first administration to the second when 
they received standard instructions. The least 
that can be said is that there was some- 
thing systematically occurring within this 
group which may be determined by further 
research. 


Summary and Conclusions 


This study attempted to evaluate the trans- 
parency of the short form of the Taylor Scale 
of Manifest Anxiety among a group of col- 
lege students. Five groups of Iowa State Col- 
lege students were given two administrations 
of the scale with instructions to appear either 


well-adjusted, poorly adjusted, or to take the 
scale honestly. Analyses of the various group 
statistics and selected item statistics led to the 
following conclusions. 

1. A preconceived set, in this case the in- 
structions to appear well or poorly adjusted, 
had a definite effect on the total score on the 
scale. 

2. There is some evidence to support the 
belief that, even under standard instructions, 
Ss made an attempt to appear well adjusted. 

3. The second standard administration of 
the test to the same group resulted in a sig- 
nificant decrease in mean score which may be 
indicative of a decrease in self-criticism due 
to poor experience with the test. 

This study indicated that the short form of 
the Taylor Scale of Manifest Anxiety should 
be used with caution. Because of the scale’s 
transparency, it does not appear to be an in- 
strument sophisticated enough to be adminis- 
tered to colleges or university Ss without a lie 
or suppressor scale, particularly when used in 
any situation in which the S$ might be moti- 
vated to alter his score in a desirable direction. 


Received February 10, 1958. 
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Indirect measurement of attitudes and in- 
terests has long been the goal of psychologi- 
cal measurement. Direct questionnaire ap- 
proaches have the disadvantage of fakeabil- 
ity and are subject to undesirable influences 
from response set and the varying social ac- 
ceptability of items. Indirect approaches by 
means of projective techniques have been 
handicapped by subjectivity, scoring difficul- 
ties, and problems of interpretation. As an 
approach combining the advantages of these 
two, the suggestion has been made that the 
well-known halo effect in ratings be used to 
measure attitudinal and interest characteris- 
tics of the rater (Campbell, 1950). The in- 
direction of such an approach could be of use 
in studying “public” vs. “private” and _ per- 
haps “conscious” vs. “unconscious” reactions, 
as well as in predicting subsequent behavior 
from test responses. At the same time, re- 
sponses would be objective and unambiguous 
in regard to scoring. 

Flyer (1951) devised a test to capitalize 
on the bias of the rater in evaluating pictures 
of people. Sixty-four pictures (of men for 
men, and of women for women) are judged 
twice by the S, the first time to select from 
each group of eight pictures the two best and 
the two least liked, and the second time to 
place each picture in one of a predetermined 
set of categories. The rationale is that, be- 
cause of the relatively unstructured nature of 
the stimulus material, the S’s responses will 
be largely determined by the sort of atti- 
tude and identification factors which are im- 
portant in projective techniques, while the 
method of responding has the scoring advan- 
tage of questionnaire-type instruments. 

In preliminary studies Flyer obtained 
promising results. College men and women 
selected the best and least liked from each 
set of pictures, then assigned each picture 
to one of eight behavior categories. An in- 
dependent rating of those categories for 
acceptability-nonacceptability was obtained 


from the same students, using a check list. 
There was a consistent tendency for pictures 
of individuals judged to have unacceptable 
characteristics to be selected as “least liked” 
and for those judged to have acceptable char- 
acteristics to be selected as “best liked,” dem- 
onstrating a nonchance yelationship between 
the two types of judgments made about the 
pictures, though the Ss were not aware of any 
connection. A repetition of this study with 
an Air Force officer: candidate group gave 
similar results. In addition, for this group 
self-ratings on the behavior variables were 
obtained, and a significant tendency was 
noted for Ss to “like” pictures of individuals 
they rated high on variables on which they 
rated themselves high, and to “dislike” indi- 
viduals whose high trait ratings were on vari- 
ables different from their own. The simple 
affective response to a picture is, then, re- 
lated to the self concept of the viewer (Flyer, 
1952). 

Chambers (1957), using a similar tech- 
nique, reported positive correlations between 
responses to college annual photographs and 
measures of lack of inferiority, ascendancy, 
and self-assertiveness for 18 women and 15 
men undergraduate students. 


Hypotheses 


An area in which indirect measurement of 
attitude seems advantageous is that of in- 
group feeling or group identification. In- 
formal observation suggests that Flyer’s Pic- 
ture-Choice technique would be useful here: 
convention behavior, for example, suggests 
that people tend at first glance to react quite 
differently to those perceived as members of 
their own fraternal or occupational gro\'p than 
to those not so perceived. In this study) the 
prediction is made that college students will 
tend to “like” pictures of individuals per- 
ceived as participants in the occupation to 
which they aspire and with which they have, 
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Table 1 


Occupational 
Category N Bus. 


Mean Picture-Choice Scores of the Six College Major* Groups on Each Occupational Category 


College Major Group 


Engr. F. Arts Educ. Soc.Sci. Phy. Sci. 





Business 33 
Engineering 32 
Fine Arts/Literary 8 
Education 53 
Social Sciences 57 


a/ 
Physical Sciences 44 


1.84 
78 
—1.27 
.00 
— 1.00 
1.15 


—.18 
2.50 
— 1.50 
25 

— .53 
.93 


50 
1.50 
2.25 

50 

— 1.00 
—.12 


09 

52 
—1.11 
1.56 
— .& 
.60 


—.24 
.08 
.22 
.66 

1.42 
.10 


— .36 
25 
— .63 
.70 
27 
1.61 


* The sample includes no agriculture majors and the university from which the sample was taken does not have a department 


of agriculture. 


then, at least partially identified. Further, 
the extent of this rating bias should be in- 
dicative of the strength of identification with, 
, or expressed interest in, the occupational field. 


Procedure 


797 


Subjects were 227 male college students, freshmen 
through first-year graduate students. The pictures 
of the Form for Men of Flyer’s Picture-Choice Test 
were used, with the occupational categories: 


Military Officer 
Business 
Engineering 

Fine Arts/Literary 


Education 

Social Sciences 
Physical Sciences 
Agriculture 


In small groups of five to 20 the Ss were given 
Picture-Choice booklets and four answer sheets. The 
first sheet contained a brief introduction to the test 
and spaces for name, college major, and college class. 
The second contained the instructions: 


A. Turn to page 1 of the Picture booklet. Look 
over the faces of the people shown there. Pick out 
the pictures of the two people you like best for any 
reason at all. Place the numbers of the liked pic- 
tures in the squares at the right 


B. Look over the pictures again and pick out the 
two you like least (dislike most). Place the num- 
bers of these disliked pictures in the squares at the 


On the third page the S classified the pictures into 
the eight occupational categories. On Sheet 4, he 
first indicated the degree of his interest in each of 
the occupations by placing a check mark in the ap- 
propriate column: highly interested, somewhat inter- 
ested, somewhat disinterested, and highly disinter- 
ested. He then ranked the occupations in terms of 
his interest, from first to eighth; and last he circled 
the occupational title most closely related to his 
college major. 


Data Treatment and Results 


Tests were scored by summing algebraically 
the numbers of liked and disliked pictures in 
each category. Mean scores obtained for the 
college majors most closely related to the oc- 
cupations listed are given in Table 1. Each 
college major group was compared with the 
five other groups combined, and chi square 
was computed as a test of independence. 


Table 2 


Mean Picture-Choice Scores for Each College Major Group on Its Related Occupational Category vs. 
Mean Picture-Choice Scores of Five Other Majors Combined 


Mean Picture- 
Choice Score 


College Major 
Group 


= 


Mean of Other 
Groups Combined 


(p] 





1.84 
2.50 
2.25 
1.56 
1.42 
1.61 


Business 


we Ww 
“Sw Ohm Ww 


Engineering 
Fine Arts/Literary 
Education 


mu 


Social Sciences 
Physical Sciences 


£ 


10.94 
5.56 
64 
5.03 
16.17 
2.80 


001 
02 
50 
02 


—.35 
42 


.29 


.10 
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Table 3 


Mean Picture-Choice Scores for 227 Subjects on the Occupational Categories by 
Rank Order of Interest 











Rank Order of Interest 





Occupational 
Category 





Military Officer 
Business 
Engineering 

Fine Arts/Literary 
Education 

Social Sciences 
Physical Sciences 


Agriculture 


These values are presented in Table 2. It is 
apparent that each college major group per- 
ceives as members of the occupation with 
which they identify the men photographed 
whom they like best, and as members of 
other occupational groups those whom they 
dislike. 

For each occupational category, mean 
scores were computed at each ranking of in- 
terest. The tendency for mean score to in- 
crease with degree of expressed interest, which 
is apparent in Table 3, is significant beyond 
the .01 level in a sign test. 


Discussion 


It is clear that the halo effect differentiates 
these college groups from each other: they 
tend consistently to perceive “liked” men as 
members of the occupation to which they 
aspire rather than as members of one of the 
other occupations; they tend to perceive more 
“liked” men as members of their chosen oc- 
cupational fields than do students with dif- 
ferent vocational aspirations. 

Also the extent of bias is related to the de- 
gree of identification or liking. Fairly con- 





sistently the preponderance of liked over dis- 
liked pictures placed in the category reflects 
the degree of interest expressed by the S in 
the occupation. 

The development of quantitative norms 
seems entirely feasible for this technique 
which, as has been pointed out (Campbell, 
1950; Chambers, 1957), combines important 
advantages of questionnaire and projective 
approaches, and which is particularly well 
suited to attitude and identification assess- 
ment because of its indirection. In informal 
discussions these students revealed no under- 
standing of the purpose. College majors as 
indicators of vocational objectives are prob- 
ably overlapping and unclear, often unstable, 
and perhaps on some occasions actually con- 
tradictory. Group differences as clear-cut as 
these, using this index, suggest the possibility 
of developing such an instrument to the point 
of individual as well as group usefulness. 


Summary 


Responses of 227 male college students to 
a modification of Flyer’s Picture-Choice Test 
were scored for the halo effect of “liking” on 
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judgment of occupation. Clear-cut differ- 
ences appeared between college majors, and 
the extent of the bias reflected the degree of 
interest expressed in the occupation. With 
refinement, perhaps just to the extent of 
separating college major fields into. vocational 
subdivisions, this technique may be useful 
for individual use as well as group differ- 
entiation. 


Received March 21, 1958. 
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The present study was concerned with 
evaluating the relative training effectiveness 
of animated transparencies as compared to 
static transparencies. Previous research by 
Swanson and Aukes (1956) and by Torkel- 
son (1954) dealing with the relative effective- 
ness of moving and static devices has not 
provided conclusive answers. However, the 
Torkelson study (1954) did suggest that 
mock-up and cutaways which permit the dem- 
onstration of movement contribute more to 
the understanding of motion concepts than do 
charts or manual illustrations. Obviously, 
training agencies need more than suggestive 
data in order to choose between types of 
training devices, especially when the choice 
must take into account the issue of cost. 

Although the two earlier studies were not 
conclusive, they do point up possibly im- 
portant variables. One of the variables is 
the motion properties of the device to be 
studied. If, as the Torkelson study suggests, 
animated devices improve the understanding 
of motion concepts, then the number and 
types of moving parts in a device should be 
a relevant variable. We would expect ani- 
mated devices to be more effective than static 
devices when the training situation involves 
devices with many as compared to few mov- 
ing parts. 

A second variable is the method of testing 
for the effectiveness of a given training device. 
In the studies cited above, paper and pencil 
tests were employed. These tests have the 
advantage of being easily administered and 
scored, but they also have the disadvantage 
of relying heavily on verbal factors. While 
it is true that knowledge of a phenomenon 


1 This research was carried out under Contract 
N61339-78, Letter Order No. 2, between New York 
University and the U. S. Naval Training Device 
Center, Port Washington, N. Y. A more detailed 
report is found in NAVTRADEVCEN Technical 
Report 78-1. 

2 The author wishes to acknowledge the valuable 
assistance of Sam Glucksberg and Roy Lachman. 


usually includes translating the phenomenon 
into verbal symbols, it does not follow that 
the type of knowledge imparted in many 
training situations can be measured by means 
of exclusively verbal techniques. It would 
seem necessary to consider the purpose of a 
specific training device. If the purpose is 
to impart knowledge of nomenclature or the 
ability to translate into verbal symbols the 
functioning of the device, then verbal paper 
and pencil tests would’ be appropriate. If 
the purpose is to teach mechanical arrange- 
ments and sequences such that the trainee 
can be given the device to disassemble or be 
given the parts to assemble, then it is prob- 
able that a nonverbal performance test would 
be appropriate. In many instances, devices 
which involve motion fall into the category 
of teaching mechanical systems, and here we 
would expect performance tests to be more 
sensitive indices of training effectiveness than 
the paper and pencil tests. 

The purpose of this study was to compare 
animated and static transparencies using 
three training devices and three methods of 
testing. The three devices differed in the 
number of moving parts and the testing 
methods differed in their emphasis on per- 
formance as compared to verbal techniques. 
It was predicted that the training effective- 
ness of animated devices would be a positive 
function of the number of moving parts in 
the devices. Furthermore, this effectiveness 
would be best demonstrated with performance 
tests. 

Procedure 

Subjects. The trainees were 150 male students 
drawn from the Washington Square College, the 
University College of Arts and Science, and the Col- 
lege of Engineering of New York University. The 
criteria employed in selecting trainees were as fol- 
lows: (a) no previous military experience; (b) no 
previous experience with weapons. The trainees 
were volunteers and were paid $1.50 for their time. 

Training devices. The training devices consisted 
of the following three transparencies: (a) the .45 
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cal. pistol, device number 29GA8. This animated 
transparency contains 11 moving parts; (b) the 30 
cal. carbine M2, device number 29HB7. This ani- 
mated transparency contains 8 moving parts; (c) 
the 30 cal. rifle M1, device number 29HB6. Only 
the trigger housing group section of this transpar- 
ency was used. This section contains 5 moving parts. 

These devices provide a continuum of motion com- 
plexity based on the number of moving parts in each 
device. At the same time, the devices are similar 
contentwise in that each describes the sequence of 
events involved in cocking and firing a hand weapon. 
These sequences are not identical in all three weap- 
ons, but they are similar enough to make the num- 
ber of moving parts the most apparent difference 
among the devices. , 

The animated transparencies also served as static 
transparencies. This was accomplished by present- 
ing various static views of the transparency without 
showing the actual transition movements. Thus, if 
the action of the hammer was being depicted, the 
hammer was shown in the cocked position as a 
static display, the lighting from the projector inter- 
rupted, and a second view of the hammer in the 
fired position shown. This was in contrast to the 
actual movement of the hammer as it was fired in 
the animated display. This technique insured that 
the only difference between the animated and static 
displays was the factor of movement; variables such 
as the size and color of the transparencies were auto- 
matically controlled. 

An overhead projector was used to present the 
various transparencies. 

Lectures. Tape-recorded lectures accompanied the 
presentations of the transparencies. The lectures for 
each device were similar in form and in length. Each 
lecture began with a general statement about the 
weapon and then went on to describe the nomencla- 
ture. The nomenclature was followed by a descrip- 
tion of the functioning sequences and then by a de- 
scription of the various safety mechanisms. The 
functioning sequences and safety mechanisms were 
described three times. Finally the nomenclature was 
reviewed. The lectures lasted from 14 to 16 minutes. 
Much of the lecture material was obtained from the 
field manuals and technical manuals associated with 
the weapons in question.* 

Training situation. The training was carried out 
with groups of 8 to 12 trainees. The trainees were 
first informed that this was to be a standard train- 
ing situation and that. they were to pay close atten- 
tion, since they would be tested after training. 

The trainees were assigned to the devices and trans- 
parency types at random. There were 50 trainees 
assigned to each device. Twenty-five were trained 
with the animated transparency and 25 trained with 
the static transparency. The study was carried out 
in phases, with training and testing on one device 
being completed before a second device was begun. 
The order of phases was: pistol, rifle, and carbine. 

3FM 23-35, TM 9-1295 for the .45 cal. pistol; 


FM 23-7, TM 9-1276 for the .30 cal. carbine; FM 
23-5, TM 9-1275 for the 30 cal. rifle M1. 


Two experimenters participated in the training. 
One experimenter operated the device and a second 
monitored the recorder. Operation of the devices 
was correlated with the lecture. With the animated 
devices the experimenter demonstrated the various 
movements described by the lecture and pointed to 
the parts indicated in the lecture. With the static 
devices, the attempt was to simulate the use of static 
slides or overlays. This was accomplished by in- 
serting a plate between the lens and bulb of the 
projector each time a new view was to be shown. 
Cues for this operation came from audible clicks 
which were recorded on the tape with the lecture. 
The static presentations were designed to impart the 
same information as did the animated presentations. 
Therefore any movement was shown in before and 
after phases. In showing the cocking of a weapon, 
a view of the uncocked weapon (the before view) 
was followed by a view of the cocked weapon (the 
after view). This was done using approximately the 
same time relationships as were used in the animated 
presentations. The cues on the tape enabled the ex- 
perimenter to gauge the time relationships. 

Tests. The same three types of tests were used 
for all devices. Test I involved knowledge (in 
verbal symbols) of function. Part A of this test 
consisted of five multiple choice questions dealing 
with maintenance. Trainees were asked to use the 
information from the training situation to trouble 
shoot particular operating difficulties. Part B was 
a fill-in-the-blanks test dealing with rote memory of 
the functioning sequences. There were nine scorable 
items in Part B. 

Test II was a nomenclature test in which trainees 
were asked to label particular parts of the weapon. 
There were 15 scorable items on this test. 

Tests I and II were paper and pencil tests ad- 
ministered in group form. 

Test III was a performance test and was adminis- 
tered individually by calling a trainee out of the 
group testing situation. The individual performance 
tests were given by two experimenters. One experi- 
menter presented the task and the second served as 
a timer and scorer. For the pistol performance test, 
the trainee being tested was seated at a table on 
which a .45 calibre pistol and an unloaded magazine 
had been placed. The trainee was instructed: “Load 
and fire the pistol.” The score was the time elapsed 
from the moment the trainee touched the weapon 
until the proper operations were performed. For 
the second test, the pistol was removed from view 
and cocked with the safety on. The trainee was 
then told: “The pistol is loaded and cocked. Fire 
the pistol.” The pistol was handed to the trainee 
as the instructions were given and timing began 
when the experimenter said the final word in the 
instructions, “pistol.” For the third test, the pistol 
was again removed from view, the slide drawn to 
the rear and locked by the slide stop. The pistol 
was handed to the trainee and he was instructed: 
“Release the slide.” Timing began when the word 
“slide” was said. On all three tasks, if a given 
trainee had not completed the proper operations in 





Robert E. Silverman 


e 


Tabie 1 
Means and SDs of the Error Scores for the 
Function Tests 


Device 


Pistol Carbine 


Mean SD 


Rifle 
parency Mean SD 





2.53 6.76 1.84 4.60 2.56 
1.95 7.08 1.87 5.20 2.00 


Animated 
Static 


90 seconds, he was shown the correct operations and 
given the score of 90.4 A trainee’s total score was 
the average of three tests. The procedures for the 
rifle and carbine tests were similar to those used 
with the pistol, although the tests differed with re- 
gard to specific content. 


Results and Discussion 


The results of each test are considered 
separately. Table 1 presents the results of 
the function tests in terms of mean error 
scores. The analysis of variance for these 
scores is shown in Table 2. The two tables 
reveal no significant differences between the 
animated and static transparencies. For each 
dévice the animated transparency produced 
slightly fewer errors, but these differences did 
not approach statistical reliability. There 
were reliable differences among the mean 


*This convention was adopted to minimize frus- 
tration which might affect subsequent tasks. In 
view of the fact that no trainees discovered the cor- 
rect operations once they had gone beyond 65 sec- 
onds, the arbitrary assigning of 90 was probably 
not too conservative. However, any bias which may 
have resulted would operate against the experimental 
hypotheses. 


error scores for the devices. The pistol and 
rifle did not differ from each other, but the 
mean error scores for the carbine were greater 
than the scores for each of the other devices. 
This difference was equally apparent for both 
the static and animated transparencies. 

The relatively poor performance on the 
carbine function test cannot be accounted 
for in terms of motion complexity, since the 
carbine transparency contained eight-moving 
parts, in contrast to eleven for the pistol and 
five for the rifle. The absence of an inter- 
action effect indicates that the differences be- 
tween the carbine and the other two devices 
did not depend upon the training condition. 

The mean error scores and the SDs from 
the nomenclature tests are presented in 
Table 3 and the analysis of variance of these 
scores is shown in Table 2. Again there were 
no differences between the two types of trans- 
parencies. However, there were differences 
among the devices. Here, the differences 
were a function of the relatively greater num- 
ber of errors made on the pistol test. This 
effect held for both the animated and static 
transparencies. The pistol was the most com- 
plex of the devices and we might attribute 
the greater number of errors to this fact. 
However, a comparison of the absolute scores 
of the three devices is not justified, in view 
of the fact that the three tests were merely 
similar in form, not identical in relative con- 
tent. This same observation holds in refer- 
ence to the differences noted above in the 
function test. 

The performance tests were scored in terms 
of the average time required to perform the 


Table 2 


Analyses of Variance of Scores for Three Tests 





Function 
Mean 
df Square F 


Source 


Nomenclature Performance 


Mean Mean 
df Square F df Square 








Device 2 74.61 15.53** 
Transparency 1 5.61 1.17 
Interaction 2 45 — 
Within Cells 144 4.81 


Total 149 


2 21.29 2 
1 1.93 1 
2 2.84 2 
144 6.80 144 


47.93 
2974.83 
320.09 
329.05 


149 149 





* .05 level of confidence. 
** 01 level of confidence. 
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Table 3 


Means and SDs of the Error Scores for the 
Nomenclature Tests 











Pistol 


Mean SD 


Rifle 
Mean SD 





Trans- 
parency 





Animated 
Static 


6.48 3.13 
6.56 2.94 5.72 2.47 


5.24 2.25 
5.08 2.02 


three (two for the carbine) tasks. The means 
and SDs of the time scores are presented in 
Table 4 and the analysis of variance is pre- 
sented in Table 2. There were consistent 
differences between the animated and static 
transparencies for each of the devices, with 
the animated transparencies showing the 
shorter performance times. These differences 
were significant at the .001 level of confi- 
dence. The magnitude of the differences be- 
tween the types of transparency appeared to 
be greater for the rifle and least for the 
pistol. However, this effect was not statisti- 
cally reliable, since the F for interaction was 
less than unity. 

The fact that the rifle produced the great- 
est difference between the animated and static 
transparencies is of interest in spite of the 
absence of a significant interaction effect. 
What makes this difference interesting is that 
the rifle contained the fewest number of mov- 
ing parts, and according to the predictions 
stated above should have been the least sus- 
ceptible to training differences. 

An examination of the rifle performance 
test revealed differences in the behavior of 
the trainees in the animated and static con- 
ditions in the first performance task. This 


Table 4 


Means and SDs of the Time Scores for the 
Performance Test 


Device 


Carbine 





Pistol 


parency Mean SD Mean SD 





28.12 15.23 
33.36 17.18 


25.48 18.52 
32.28 18.72 


Animated 
Static 


23.00 16.68 
37.68 19.91 





Table 5 


Frequency of Hammer and Trigger Guard Cocking 





Transparency 





Animated Static 





Number Using Hammer 5 15 
Number Using Trigger Guard 20 10 


task consisted of requiring the trainees to 
cock and fire the trigger housing unit of the 
rifle. In the training demonstration cocking 
was done by pulling down and then pulling 
up the trigger guard. This action cocked the 
hammer, and it was this operation that was 
scored as correct. If a trainee cocked the 
hammer by pushing down the hammer with- 
out using the trigger guard, he was then 
asked to cock the rifle again, this time with- 
out pushing down on the hammer. The time 
required in the second cocking operation (not 
including the first operation) was used in the 
trainee’s final average. It was noted that 
many trainees from the static condition used 
the hammer method first, while only a few 
trainees from the animated condition used 
the hammer method. Table 5 shows the fre- 
quency of hammer and trigger guard cocking 
for the two types of transparency. 

The chi square for the data in Table 5 was 
8.33, significant at the .01 level of confidence. 

One interpretation of this finding is that 
the animated transparency permitted the 
trainees to observe the simultaneous move- 
ment of the hammer and trigger guard, while 
in the static transparency the majority of 
trainees attended to a single moving part, the 
hammer. It would seem that the animated 
training condition provided more information 
regarding relationships between moving parts. 
This interpretation is a tentative one and de- 
serves further consideration where the prob- 
lem of animated and static training devices is 
concerned. 


Summary and Conclusions 


The results of the two paper and pencil 
tests, one dealing with function and the other 
with nomenclature, indicated that animated 
transparencies were no more effective than 
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were static transparencies. However, the per- 
formance tests results did show that the 
animated transparencies were more effective 
training devices than were the statics. These 
findings were not related to the particular de- 
vice, but were found to be equally applicable 
to all three devices. ’ 

The relative superiority of the animated 
transparencies in the performance test was 
predicted from previous considerations of the 
role of verbal factors in training and testing. 
It would appear that the relatively nonverbal 
performance tests are a more sensitive index 
of training than are the standard pencil and 
paper tests. This conclusion must be quali- 
fied in terms of the purpose of the training. 
If the training purpose is to impart knowl- 
edge in verbal symbols, then the static type 
of transparency may be as effective as the 
animated type. 

A further finding from an examiriation of 
the rifle performance test was that the effec- 
tiveness of the animated transparency be- 


comes particularly apparent when more than 
one part of a device is in motion at one time. 
When relationships among moving parts are 
involved, the animated transparency clearly 
appears to be more effective. 

Differences among the absolute test scores 
were observed on the paper and pencil tests. 
However, these differences were very likely a 
function of differences in the content of the 
tests, since they were*not systematically re- 
lated to the complexity of the devices. 


Received March 28, 1958. 
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Few human engineering principles appear 
to be so firmly entrenched as is the principle 
that rotary controls should turn clockwise to 
increase. There appears, however, to be little 
or no experimental evidence in support of it. 
The present study therefore was undertaken 
to determine whether it corresponds to a true 
“population stereotype” or should be regarded 
merely as a convention adopted for purposes 
of standardization. In the case investigated, 
results are qualified by, and appear to be con- 
tingent upon, the fact that the display pre- 
sented feedback information without visible 
movement. 


Method 


The apparatus consisted of a box on the front of 
which were mounted a knob and a light, the latter 
directly above the former. At the beginning of a 
trial, the knob was always set in the center of its 
range of excursion at which point the light was 
either bright, dim or off. If the light was on, turn- 
ing the knob a small distance from the center posi- 
tion produced no perceptible change in brightness. 
However, turning it to either end of its range of ex- 
cursion caused the light to brighten perceptibly if it 
had initially been dim or to dim perceptibly if it 
had originally been bright. The apparatus was al- 
ways set so that turning the knob to an extreme po- 
sition would accomplish the effect requested of the S. 

The Ss were 150 male and 150 female college stu- 
dents. Each was used for a single trial under a 
single experimental condition. The S was seated 
facing the front of the box which rested on a table. 
Two hundred and forty right-handed Ss received in- 
structions requiring a change in the brightness of the 
light. There were eight such instructions of which 
the first 26 words were identical. Fifteen males and 
15 females were run under each of these eight condi- 
tions. The common introductory portion of the in- 
structions and the eight different concluding phrases 
are given below. The letters in parentheses preced- 
ing each concluding phrase will serve as an identify- 


1 This research was supported by the USAF under 
Contract No. AF 33(616)3404 monitored by the Aero 
Medical Laboratory, Wright Air Development Cen- 
ter. Permission is granted for reproduction, publi- 
cation, use and disposal, in whole or in part, by and 
for the United States Government. 

2 This research has been described in WADC Tech- 
nical Report 57-388 (see References). The present 
article is a condensed form of that report. 


The Ss instructions: 


On this box is a light (Experimenter points to 
light.) which is controlled by the knob below it. 
(Experimenter points to knob.) I would like for 
you to take hold of the knob and, 


(1) increase the light, 

(D) decrease the light, 
(MB) make the light brighter, 
(MD) make the light dimmer, 

(IB) increase the brightness of the light, 
(DB) decrease the brightness of the light, 
(ID) increase’ the dimness of the light, or, 
(DD) decrease the dimness of the light. 


The 60 remaining Ss received a single instruction. 
The light was removed from the box, and the entire 
box was covered by a strip of cardboard through 
which only the knob protruded. The S was in- 
structed as follows: 


(T) Before you is a knob. (Experimenter points 
to knob.) When I say “ready,” reach out 
and turn the knob. . . . Ready. 


Thirty right-handed Ss and 30 left-handed Ss were 
run under this instruction. There were 15 males and 
15 females in each of the above groups. In all of 
the above cases direction of initial knob turn was 
observed and recorded by the experimenter. 

The significance of response frequencies for a single 
category of Ss was obtained from binomial tables 
(Harvard University, 1955). The significance of the 
difference in response pattern between two cate- 
gories of Ss was tested by casting the data into 
a fourfold table and consulting tables (Mainland, 
Herrera, and Sutcliffe, 1956) of probabilities based 
on Fisher’s exact method for small samples and chi 
square with Yates’ correction for large ones. All 
tests were two-tailed. 


Results 


Complete data for this experiment are given 
in Table 1. 

Failure of sex differences to appear. Ninety- 
nine of the 150 males and 97 of the 150 
females turned the knob clockwise. Eighty- 
four of 120 males and 92 of 120 females 
turned the knob in accordance with the hy- 
pothesized clockwise-to-increase—counterclock- 
wise-to-decrease stereotype. By neither meas- 
urement is the difference in performance be- 
tween the two sexes significant. Nor does a 
significant sex difference appear in the re- 
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sponses to any of the 10 individual condi- 
tions. Therefore, the data for the two sexes 
will be combined in all further comparisons. 
Effect of handedness on direction of knob 
turn when direction of functional change is 
unspecified. Of the 30 right-handed Ss asked 
simply to turn the knob, 27 turned it clock- 
wise. Of an equal number of left-handed Ss 
given the same instruction, only 19 turned 
the knob clockwise. The hypothesis of equal 
frequency of clockwise and counterclockwise 
knob turn can be refuted at far beyond the 
.001 level of significance in the former case, 
but cannot be rejected in the latter (P = 
.20). The difference in response between the 
two handedness groups is significant at the 
5% level. It must be concluded, then, that 
there is a very strong “stereotype” for right- 
handed Ss to turn a knob clockwise in a com- 
pletely “unstructured” situation in which the 
knob’s function is unknown and neither the 
intention to “increase” nor the intention to 
“decrease” has been specified. This stereo- 
type is significantly weaker, and may be al- 
together absent, in left-handed Ss. There is 
no indication however that left-handed Ss 
have a “turn counterclockwise” stereotype. 
Persistence of tendency for right-handed 
subjects to turn knob clockwise when direc- 
tion of required functional change is specified. 
When the situation is structured by the pres- 
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ence of a visible display and by instructions 
as to the direction of change required in the 
display, the tendency to turn clockwise per- 
sists. Two hundred and forty Ss received 
such direction-of-change instructions. For 
each subgroup receiving a particular instruc- 
tion another equal-sized subgroup received 
the opposite instruction. Therefore, although 
instructions may have biassed the Ss’ re- 
sponses (presumably, but not necessarily, 
in accordance with a clockwise-to-increase— 
counterclockwise-to-decrease stereotype), op- 
posite biases operated upon equal numbers 
of Ss so that, if the tendencies to “turn clock- 
wise” and “turn counterclockwise” were equal, 
half the responses should have been clockwise 
turns, half counterclockwise turns. Such was 
not the case. Of 240 Ss, 150 turned the 
knob clockwise, 90 counterclockwise, the in- 
equality of response frequencies being sig- 
nificant at beyond the .001 level. It must 
be concluded, therefore, that irrespective 
of any turn-clockwise-to-increase-and-counter- 
clockwise-to-decrease stereotype, there is a 
strong “turn clockwise” stereotype. 
Existence of a strong turn-clockwise-to- 
increase—counterclockwise-to-decrease stereo- 
type. Since the preceding paragraphs have 
demonstrated a “turn clockwise” stereotype 
which operates regardless of the direction of 
the required change, this stereotype must 


Table 1 


Number of Subjects Turning Knob Clockwise (C) and Number Turning Counterclockwise (CC) 


Both Sexes 


Instructions Cc cx 


Under Each Instruction 


Males 


Cc CC 





29*** 1 
11 19 
28*** 2 
14 

28*** 2 
9* 21 
13 17 
18 12 


Increase 

Decrease 

Make Brighter 

Make Dimmer 

Increase Brightness 

Decrease Brightness 

Increase Dimness 

Decrease Dimness 

Turn Knob 
Right-Handed Subjects 3 
Left-Handed Subjects 11 





* Significant at .05 level for two-tailed test. 
** Significant at .01 level for two-tailed test. 
*** Significant at .001 level for two-tailed test. 


is; 0 

+ 1 7 

14*** 14*** 

9 5 
15*** 

4 5 

9 4 

8 10 
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Table 2 


Two-Tailed Binomial Tests of Significance on Proportion of Subjects Turning Knob Clockwise (C) and on 
Proportion Conforming to a Clockwise-to-Increase—Counterclockwise- 





Combinations of 
Instructions 


% C 


to-Decrease Stereotype (S) 





Sig. Sig. 
Level Level 





I+D 

MB + MD 

IB + DB 

ID + DD 
I+MB+IB+DD 
D+MD-+ DB +I1D 


66.7 
70.0 
61.7 
51.7 
85.8 
39.2 


62.5 


63.3 
61.7 


somehow be subtracted or cancelled out from 
the data, before testing for a clockwise-to- 
increase—counterclockwise-to-decrease stereo- 
type. This can be accomplished by combin- 
ing a subject group whose instructions re- 
quire a clockwise response to conform to the 
latter stereotype with an equal-sized subject 
group whose instructions require a counter- 
clockwise response in order to conform to it. 
This has been done in Table 2. The results 
show that 176 of the 240 Ss responded in ac- 
cordance with the hypothesized stereotype. 
Such an extreme split of responses would oc- 
cur less than once in 100,000 times by chance. 
The hypothesized stereotype is also signifi- 
cantly supported by each of the subgroups of 
Table 2 except for the ID + DD subgroup. 
It is of interest to learn whether or not the 
hypothesized stereotype is one-sided, i.e., ap- 
plies only when an increase is called for. If 
this were the case, the general tendency to 
turn clockwise would simply be strengthened 
when an increase was required:and unaffected 
otherwise. Of 120 Ss required to make an 
increase, 103 turned the knob clockwise. Of 
an equal number required to make a de- 
crease, 73 turned it counterclockwise. Both 
results are significantly (P < .025) in con- 
formity with a clockwise-to-increase—counter- 
clockwise-to-decrease hypothesis, but differ 
significantly (P < .001) in the proportion 
conforming. It must be concluded, then, 
that while the hypothesized stereotype op- 


.01348 
.00268 
09246 
89742 
00000 
.02208 


00040 


.00012 


00446 
.01338 


84 
92 


70.0 


28 76.7 00000 


erates in both directions it is weakened when 
a decrease is required, presumably because 
this throws it into conflict with the “turn 
clockwise” stereotype. 

Effect of phraseology upon strength of 
clockwise-to-increase stereotype. Another fac- 
tor which can be checked is whether or not 
the words “increase” or “decrease” are re- 
quired to evoke the stereotype. There is no 
significant difference in pattern of response 
between any two of the three subgroups, MB 
+ MD, 1+D, and IB+ DB. Presumably, 
then, it would be just as effective, but more 
economical, to label a knob “brightness” as 
to label it “increase brightness.” There is, 
however, a difference in response, significant 
at the .01 level, between the IB + DB and 
ID + DD subgroups. The stereotype can be 
reduced in strength, therefore, and perhaps 
even eliminated, but not reversed, by phras- 
ing the operator’s instructions in terms of the 
“antifunction.” 


Discussion 


It has been demonstrated that when ma- 
nipulating a rotary knob to effect changes in 
a motionless display, operators show a strong 
predilection for turning clockwise-to-increase, 
counterclockwise-to-decrease. There is reason 
to believe, however, that this stereotype be- 
comes greatly attenuated when the task is 
structured elaborately enough for other de- 
terminants of direction of knob turn to be 
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present and to compete with it. Specifically, 
data presented in (Bradley: 1954, 1957; 
Holiling, 1957; Warrick, 1947) and discussed 
in (Bradley, 1957) indicate that the stereo- 
type, if it exists at all, is mot the primary de- 
terminant of direction of knob turn when 
knob rotation causes motion of an observed 
display indicator. 


Summary 


Right-handed Ss were asked to grasp a 
knob and turn it so as to effect a specified 
change in the intensity of a light mounted 
just above it. Equal numbers of Ss were 
asked to increase and to decrease the bright- 
ness of the light, the request being phrased 
in a variety of ways. Two significant tend- 
encies were found. 

First, 73.3% of the Ss turned the knob 
clockwise-to-increase or counterclockwise-to- 
decrease the brightness of the light. This 
tendency was strongest when an increase was 
required and when the function to be con- 
trolled was phrased in positive terms (i.e., as 
“brightness” rather than “dimness’”’). It was 


not significantly dependent upon the use of 
the words “increase” or “decrease” or upon 


the sex of the operator. Other experiments 
indicate that these results are contingent upon 
the use of a display which presents changes 
in information without visible movement. 


Bradley 


Second, 62.5% of all Ss turned the knob 
clockwise. This general turn-clockwise tend- 
ency was found to persist among an addi- 
tional set of right-handed Ss when the light 
was covered up and the S was asked simply 
to turn the knob; among left-handed Ss (used 
only in this condition) the tendency to turn 
clockwise was not statistically significant. 
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In any research program concerned with 
the development of predictor tests, it is neces- 
sary to have criterion measures of the behav- 
ior to be predicted. Frequently these meas- 
ures, such as pass-fail in a training program 
or an index of job productivity, are readily 
available; but just as frequently it is neces- 
sary to synthesize new criterion measures if 
the behavior to be predicted does not already 
have well-defined operational specificity. The 
latter was the case when the research to be 
presented was undertaken. The report de- 
scribes the methods and results of a project 
to develop multiple criteria for use in an 
Air Force pilot selection research program 
(Kubala, 1958; Sells: 1951, 1956; Trites, 
Kubala, & Cobb, in press), and the con- 
struct validation of these criteria by means 
of an independent categorization (Brown & 
Trites, 1957) of Ss predicted to be at the ex- 
tremes of the criterion dimensions. 

The criterion measures of primary interest 
in this investigation are related to a construct 
called adaptability (Sells, 1956) which has 
been postulated to account for certain behav- 
ior observed during and subsequent to pilot 
training. The construct refers to tempera- 
mental and motivational characteristics, such 
as emotional disturbance or program-oriented 
motivation deficit,: which contribute to a 
man’s success or failure in training and his 
continued adjustment to military flying. As 
such, it may be contrasted with the construct 
of ability which refers primarily to aptitude 
or skill factors accounting for flying training 
success or failure and for which a criterion of 
pass-fail in training is most appropriate. 

Inferences drawn from the two constructs 
and findings of previous research (Brown & 
Trites, 1957; Sells: 1951, 1956) suggested 
that categorization of training outcome as 
pass or failure, due principally to ability, 
motivational, or emotional causes, should be 


1 Now at the Texas Women’s University. 


differentially related to several relatively in- 
dependent criterion dimensions identified by 
factor analysis of data collected during pilot 
training. This procedure follows the inter- 
pretation of the construct validation process 
given by Campbell and Tyler. “A given sci- 
entific construct has multiple potential opera- 
tional specifications. If, as sampled, the op- 
erational specifications concur, the construct 
and the sampled measurement techniques 
have validity” (Campbell & Tyler, 1957, 
p. 91). In addition, it recognizes the sug- 
gestion recently made by Ghiselli with respect 
to the dimensional aspects of criteria. As he 
indicated, “. . . it would appear that... 
performance on any given job is best de- 
scribed in terms of several dimensions, and 
one dimension is not sufficient” (Ghiselli, 
1956, p. 2). 


Procedure 
Sample 


The 792 Ss forming the basic sample were avia- 
tion cadets in Classes 54-M to 56-E who entered pri- 
mary flight training at Graham Air Base, Marianna, 
Florida, between July 1953 and November 1954. All 
were between the ages of 19 and 28 and had already 
been preselected by a rigorous physical examination 
and a battery of aptitude tests. From these, 377 Ss 
having relatively complete data on all the variables 
considered in the study were available for intensive 
analysis. Subsequent analyses, based upon findings 
with the smaller group, utilized the total sample. 


Variables 


A total of 23 variables, representing information 
collected prior to, during, or at the end of flight 
training, were studied. These may be grouped as: 

1. Test scores resulting from relatively objective 
measuring devices or objectively verifiable charac- 
teristics of the Ss; eg., Age, Pilot Stanine, Officer 
Quality Stanine, Academic Average, Demerits, and 
Solo Time.? 

2 Tables giving a complete description of the vari- 
ables, the matrix of intercorrelations, the original 
centroid loadings, the quartimax loadings, and final 
adjusted loadings are contained in Trites et al. (in 
press). 
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Table 1 


Faculty Board Classification System 


Failure Category 


AE—Ability 


Method of Classification* 





Students who are apparently well motivated to complete training, with 


little or no evidence of fear or apprehension of flying, and are clearly 
eliminated because of inability to meet flying or academic standards. 


ME—Motivational 


Students clearly evidencing a lack of motivation to complete training 


although apparently possessing adequate ability and indicating little or 


no fear or apprehension of flying. 


Most frequently includes: 


1. Repeated violators of training rules. 
2. Self-initiated eliminations indicating lack of motivation. 


EE-—Emotional 


Students clearly evidencing fear or apprehension of flying or exhibiting 
disabling personality inadequacies. 


Most frequently includes: 


1. Self-initiated eliminations because of “fear of flying.” 
2. Self-initiated eliminations where flight surgeon has indicated failure 
to be due to disabling personality inadequacies. 
. Other eliminations where evidence indicates real cause to be fear or 
apprehension of flying. 


Ad. E—Administrative 


Students receiving hardship or compassionate discharge or eliminated for 


physical causes where motivation, ability, and emotional status appear 


adequate. 


® These descriptions are condensations of those actually used by the raters (Brown & Trites, 1957). 


2. Ratings of the Ss by (a) peers, (b) superiors, 
or (c) experts (Trites & Sells, 1957); eg., ratings 
of Judgment, Leadership, Combat Stress Tolerance, 
and Familiarity, or evaluation based upon medical 
histories and actual performance during training 
flights. 

Most of the variables represented data available 
by the end of the primary phase of flight training 
(approximately the first six months). However, 
data for two variables, Pass-Fail in flight training 
and Faculty Board Classification, were not com- 
pletely available until the completion of the basic 
phase of training (approximately the last six months). 

Because they were used to evaluate the validity of 
the criterion dimensions and the adaptability con- 
struct, the Faculty Board Classifications (FBC) may 
be considered the most important single variable in 
the study. Consequently, a detailed description is 
warranted. 

Should a student in flying training be considered 
for elimination, he would meet a board composed of 
officers representing the faculty of the flying school 
to which he has been assigned. This Faculty Board 
interviews the student and his instructors, and con- 
siders all available evidence relevant to the student’s 
status in the training program. After deliberation, 
the board decides whether the student should be 
eliminated or returned to training, recommends for 
or against training in some other aircrew specialty, 
and determines an official cause of failure. The ver- 
batim transcript of the deliberations, together with 


supporting documents, constitutes the faculty board 
proceedings. 

It has been found (Brown & Trites, 1957) that 
these proceedings contain sufficient information to 
permit reliable classification of each failure into 
ability, motivational, or emotional deficiency cate- 
gories.* Such classifications were made for failures 
in the present study independently of all the other 
variables investigated. 

A description of the procedure used to derive the 
Faculty Board Classifications is presented in Table 1. 


Treatment of Data ' 


Intercorrelations of all variables except Faculty 
Board Classifications were factored and rotated to 
approximate simple structure by an electronic com- 
puter programmed for the Thurstone centroid method 
of factoring (Thurstone, 1947) and the quartimax 
method of rotation (Neuhaus & Wrigley, 1954). 
The factoring program permitted iteration of the 
centroid solution in order to stabilize the commu- 
nality estimates. Adjustments of the quartimax ro- 
tations were made graphically, without knowledge 
of the identity of the variables, to obtain better 
orthogonal simple structure. 

After interpretation of the factors and computa- 
tion of factor scores, Ss were grouped as either pass 


3A fourth category of elimination, administrative, 
did not occur frequently enough to justify its inclu- 
sion in analyses involving the FBC. Ss for whom 
no Faculty Boards were available were also omitted. 
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or, on the basis of their Faculty Board Classifica- 
tion, as ability, motivational, or emotional failures. 
Hypotheses concerning differences in factor scores 
between the different groups were examined for all 
Ss by one-way classification analysis of variance. 


Results 


Findings are presented in two parts. The 
first covers the results of the factor analysis; 
the second describes the evaluation of the hy- 
potheses derived from consideration of the 
factor structure and the adaptability con- 
struct. 


Factor Analysis 


Eight factors were extracted from the in- 
tercorrelations of the 22 variables. Nineteen 
iterations were required to stabilize the com- 
munality estimates to meet an arbitrary cri- 
terion for a change of less than .005 in the 
estimates for each variable. At this point 
the largest value in the residual correlation 
matrix was .04. After graphic adjustment of 
the quartimax rotations, Factors VI, VII, and 
VIII were dropped from consideration. 

Factor I. The various peer ratings had 
their heaviest loadings on this factor with 7 
of the 12 ratings having loadings greater than 
.80. Examination of the definitions of the 
most heavily saturated variables suggested 
that the factor reflected a respectful attitude 
on the part of peers toward those men en- 
dowed with attributes represented in the rat- 
ings. The fact that age had a relatively large 
positive loading on this factor agreed with 
the interpretation since older men were prob- 
ably perceived as exhibiting superior judg- 
ment, leadership, and so on. Finally, identi- 
fication of a second independent factor, based 
upon peer ratings, lessened the possibility that 
this was merely an instrumental factor re- 
flecting only that ratings had been made in 
a similar fashion on similar scales. Conse- 
quently, the name given the present factor 
was Peer Respect. 

Factor II. The three variables with the 
highest loadings on this factor were peer rat- 
ings whose definitions suggested an orienta- 
tion toward the group and an interest in 
working in harmony with others. This im- 
plied a reciprocal acceptance by peers. The 
loading of the variable based upon medical 


history fitted this interpretation since it is 
logical, in the context of the pilot training 
program, that men with fewer medical com- 
plaints would be perceived as more accept- 
able on a team, likeable, and cooperative. 

The impression of general acceptability, or 
likeability, as an -associate, inherent in the 
principal defining variables, suggested that 
this represented a secondary dimension of 
peer evaluation independent of the more 
clearly defined Peer Respect Factor. There- 
fore, it was named the Peer Acceptance 
Factor. 

Factor IlI. The defining variables for this 
factor were a score based on the number of 
demerits, ratings by instructors, and a peer 
rating of cooperation. Inasmuch as demerits 
were awarded by a man’s tactical officer in- 
structor and a man with many demerits was 
likely to be perceived by peers and instruc- 
tors as less cooperative and conforming, the 
grouping of variables was understandable. It 
is plausible to assume that a man who was 
resistant to the demands of the training situa- 
tion would have a low score on this factor. 
Hence, it was called the Group Conformity 
Factor. 

Factor IV. Heavy loadings of variables 
representing academic performance, intelli- 
gence, and training outcome suggested that 
the factor be named the Academic Achieve- 
ment Factor. It was considered to represent 
an ability dimension. 

Factor V. The only variables having size- 
able loadings on this factor were those in- 
dicative of actual accomplishment in fly- 
ing. The obvious conclusion was that this 
represented the ability dimension of Flying 
Achievement. 

It is noteworthy that all but one of the five 
factors could be matched by inspection with 
factors extracted in an earlier investigation 
of somewhat different training level criteria 
(Kubala, in press). The unmatched factor, 
Peer Acceptance, was probably confounded 
with the peer factor extracted in the earlier 
study. 


Evaluation of the Criterion Dimensions and 
the Adaptability Construct 


Consideration of the adaptability construct 
and the defining variables for each factor led 
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to the formulation of four specific hypotheses 
about the largest and smallest factor-score 
means within each factor for Ss categorized 
as pass or ability, motivational, or emotional 
fail. The hypotheses, with their rationales, 
were: 

1. Since Ss with the greatest ability and 
adaptability should be among those who com- 
pleted the training program, the pass group 
should have the largest mean of any group 
on all factors. 

2. Since the Flying Achievement and Aca- 
demic Achievement Factors were considered 
ability dimensions, and since ability failures 
are expected to reflect defective ability pri- 
marily, the ability fail group should have the 
smallest mean of any group on these two 
factors. 

3. Previous research (Brown & Trites, 
1957) indicated that men classified as emo- 
tional failures (poor adaptability) tended to 
be eliminated relatively early in the training 
program. Such men should be very evident 
to peers in the close associations of the com- 
petitive pilot training environment soon after 
entry. Since the peer ratings used in the 
present study were obtained within the first 
six weeks of training, it was hypothesized 


that the emotional fail group should have the 
smallest means of any group on the Peer Re- 
spect and Peer Acceptance Factors. 

4. Unlike the emotional failures, men who 
have been classified as failing for lack of mo- 
tivation tended to be eliminated late in the 
training program and, on measures derived 
from. flying performance, they look much 
like pass Ss (Brown & Trites, 1957). Even 
so, they are obviously exhibiting insufficient 
adaptability to the demands of the situation 
which may be reflected in an overt lack of 
conformity apparent to others and producing 
sanctions such as demerits. Hence, the mo- 
tivational fail group should have the smallest 
mean of any group on the Group Conformity 
Factor. 

Scores for the five interpretable factors 
were computed for all Ss in the total sample 
having the required data. Any variable used 
to estimate a factor score and not already in 
stanine form was rescaled to have a mean and 
standard deviation approximately equal to 
those of the stanine scale. Factor scores were 
then obtained by algebraic combination of 
the unweighted scores (Trites & Sells, 1955) 
on the appropriate variables. ‘ 

As mentioned previously, the hypotheses 


Table 2 


Factor Score Means and F Values from Analysis of Variance 


Factor 


Failures 





Motiva- 


Ability tional E motional F 





Peer Respect 


Peer Acceptance 


Zax 2 el 


Group Conformity 


Academic Achievement 


Zr 2 rl 


Flying Achievement 


Z Al 


** Significant at less than the .01 level. 
*** Significant at less than the .001 level. 


30.8 14.61*** 


31.6 25.2 
108 55 5 


9.4 9.5 8.4 
108 56 26 


4.29** 


19.6 17.5 
108 55 


22.35°°* 


7.9 8.9 
56 


7s 


44.64*** 
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were evaluated by one way classification 
analysis of variance. Table 2 contains the 
F values, means, and the number of Ss in 
each group for each factor. The compari- 
sons of the means of the pass, ability fail, 
motivational fail, and emotional fail groups 
were all significant at less than the .01 level. 
In every instance the groups with the largest 
and smallest means were those which had 
been hypothesized.* 


Discussions and Conclusions 


Agreement among the different characteri- 
zations of adaptability, derived from the fac- 
tor structure and the adaptability construct, 
support the validity of the construct. Within 
the limits of the study, it has been possible 
to define operationally three dimensions of 
adaptability, Peer Respect, Peer Acceptance, 
and Group Conformity and two dimensions 
of ability, Flying Achievement and Academic 
Achievement. Through confirmation of the 
hypotheses concerning the relationships be- 
tween these dimensions and groups of Ss 
classified as pass, ability fail, motivational 
fail, and emotional fail, it may be concluded 
that useful criteria of adaptability have been 
isolated. 

Further evaluation of the adaptability di- 
mensions is necessary. It is of particular im- 
portance to determine the relation of indi- 
vidual differences among the pass Ss to post- 
training adaptability assesments. Fortunately, 
there is evidence (Kubala, in press; Trites 
& Kubala, 1957) that criterion assessments 
corresponding to the Peer Respect and Group 
Conformity Factors are marginally, but sig- 
nificantly, related in the expected direction 
to posttraining evaluations of adaptability. 


4#Since the majority of failures occurred during 
primary flight training and all of the variables used 
in the factor scores were collected during this same 
period, the possibility existed that the Faculty Board 
members may have been aware of some of the items 


contained in the factor scores. To avoid this pos- 
sible lack of independence, the variance analysis was 
repeated using only pass Ss and eliminees from the 
later, basic training phase. Although the number of 
fail cases was greatly reduced, the hypotheses were 
generally supported by the over-all F tests and the 
direction of mean differences, or by the finding that 
the only significant differences between individual 
pairs of means, using the Scheffé criterion (1953), 
were those which had been predicted as being high- 
est and lowest on the factors. 


In addition, an unpublished investigation of 
Officer Effectiveness Reports has revealed a 
significant correlation in the predicted direc- 
tion between demerits accrued during pri- 
mary pilot training and later ratings of effec- 
tiveness. 

On the basis of these findings the Group 
Conformity and the Peer Respect Factors 
have been combined to form a composite 
Adaptability Index. This Index, together 
with the other factor scores, can be used to 
structure samples in order to achieve better 
control for research purposes. 


Summary 


A factor analysis of 22 variables obtained 
for aviation cadets during pilot training re- 
vealed five interpretable factors: Peer Re- 
spect, Peer Acceptance, Group Conformity, 
Academic Achievement, and Flying Achieve- 
ment. Hypotheses derived from the construct 
of adaptability were supported by comparison 
of factor scores for groups of Ss classified ac- 
cording to training outcome as pass, ability 
fail, motivational fail, or emotional fail. This 
was considered evidence for the validity of 
the construct and the usefulness of the cri- 
terion dimensions. 
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A COMMENT ON THE RECENT STUDY OF THE 
MECHANICAL COMPREHENSION TEST (CC) 
BY R. L. DECKER 


W. A. OWENS 


Iowa State College 


The subject article, intended as a partial 
evaluation of the Mechanical Comprehension 
Test, Form CC, indicates a considerable lack 
of understanding of the purposes for which 
the test was developed and of its appropriate 
uses. It is also lacking in accuracy to such 
an extent that some comments on it appear 
to be in order. 

First, Decker (1958) quotes the writer as 
saying that Form CC was designed to meas- 
ure “the degree of aptitude needed for suc- 
cessful performance in engineering courses 
or in engineering positions after graduation” 
(italics mine). But what the original article 
(Owens, 1950) actually said was: “The cen- 
tral problem of the present investigation is 
to evaluate this new test of Mechanical Com- 
prehension, Form CC, in the potential selec- 
tion of engineering students” (italics mine). 

Second, Decker says: 


On the basis of the results obtained when the test 
was administered to 725 incoming freshmen, Owens 
concludes that the Form CC scores were making a 
significant independent contribution to the predic- 
tion of engineering school grades. Although this 
conclusion may be supported by the obtained data, 
it seems that the amount of this contribution was 
so small that consideration of scores on the test in 
addition to general aptitude test scores in selecting 
students or employees might not be justifiable con- 
sidering the added time and expense involved. 


This is a particularly curious statement since 
Table 5 of the original article deals with this 
precise point and indicates the independent 
contributions of the ACE, high school aver- 
age, and the MCT to the prediction of cer- 


tain academic criteria. The most relevant of 
these is Engineering Drawing grade, to the 
prediction of which the ACE contributes 
44%, the high school average 15% and the 
MCT 41% of the total predictable variance. 
Certainly by the standards of most students 
of measurement this last is a substantial and 
practically significant contribution, and the 


test is worth giving if the ACE is worth 
giving. 

Third, Decker used the MCT to predict 
supervisory performance in a group of Ss 
characterized as follows: (a) 80% were col- 
lege graduates in engineering or the physical 
sciences; (b) they had served an average of 
four and one-half years in a large manufac- 
turing organization; and (c) they had all 
been screened with a general aptitude test 
prior to hiring. It goes almost without say- 
ing that MCT was not designed to predict 
supervisory performance; and that even if the 
criterion had been appropriate, the subject 
group is so clearly restricted at a high level 
as to render prediction on the basis of me- 
chanical aptitude difficult if not impossible. 

Fourth, having found a test-criterion cor- 
relation of 0.074, Decker made an item analy- 
sis. He obtained 14 significant item-test cor- 
relations, 3 of which were negative and 11 of 
which were positive (not too strangely the 
median value of the former was — 0.25 and 
that of the latter + .025). Without benefit 
of cross-validation, he then used the 11 items 
as a test, obtained a criterion correlation of 
0.31 and recommended: “further research to 
determine the characteristics of the valid 
items. .. .”’ In this context, the writer does 
not feel called upon to say poorly what Cure- 
ton (1950) has already said well. 
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RELATIONSHIP OF NATIONAL MERIT SCHOLAR- 
SHIP SCREENING TEST SCORES TO TEST 
DATA OBTAINED EARLIER IN HIGH 
SCHOOL 


EDWARD O. SWANSON anv WILBUR L. LAYTON 


Student Counseling Bureau 


Nation-wide attention has been focused on 
the problem of financing higher education. 
Scholarships, both private and federal, have 
been the topic of discussion recently. Most 
of this attention has been devoted to the 
granting of scholarships after the student has 
virtually completed high school and indicated 
his desire for higher education. Very little 
attention has been given to the fact that 
family attitudes and other social and cultural 
pressures combine to keep students from 
higher education even though they have the 
ability and may have the necessary financial 
resources (Berdie, 1954). A recognition of 


these attitudinal and societal factors empha- 
sizes the need for early identification of talent 
and the counseling of talented students be- 
ginning as early as possible in their school 


career. The ninth grade appears to be a criti- 
cal time in the career of students and it is at 
this point that identification might best take 
place and counseling begin. As our store of 
knowledge about the long-range predictive va- 
lidity of tests used for identification purposes 
in the ninth grade grows, this counseling can 
become more efficient. The present study 
describes an attempt to determine the pre- 
dictive validity of tests given to students 
early in their high school careers through the 
Minnesota State-Wide Testing Program. Spe- 
cifically, these test data were correlated with 
scores on the National Merit Scholarship Cor- 
poration Screening Test. It was hoped that 
the results of this study would give counselors 
information about how students’ early test 
data are related to scores on the Screening 
test. 

In 1955 the initial screening test of the 
National Merit Scholarship Corporation was 
given to 1543 high school seniors in 336 high 
schools in Minnesota. This group consisted 
of 659 boys and 884 girls. From the files of 


the Student Counseling Bureau State-Wide 
Program, test scores of these persons were 
obtained and correlated with their scholar- 
ship screening test scores. In addition to the 
correlation, means and standard deviations 
were determined. 

Table 1 shows by sex and by sex combined 
the means, standard deviations, and number 
of those persons in the screening group who 
had taken each test administered through the 
state-wide testing program. This table also 
shows the students’ year in school during 
which the state-wide test was taken. For 
example, Table 1 shows that 178 boys and 
315 girls in the screening group had taken 
the American Council on Education Psy- 
chological Examination (ACE) during their 
freshman year in high school. The mean of 
the boys’ scores on the ACE is 94.0 and the 
standard deviation is 15.4. For the girls, the 
mean is 91.7 and the standard deviation is 
14.8. For boys and girls combined, the mean 
of the ACE taken as freshmen is 92.5 and the 
standard deviation is 15.0. 

The boys’ mean on the ACE corresponds 
to a centile rank of 95 based on Minnesota 
state-wide norms. The girls’ mean corre- 
sponds to a centile rank of 94 on the same 
norms, and the mean for the combined sex 
group corresponds to a centile rank of 95. 

From Table 1 we see that the mean high 
school rank for boys is 91.8 and for girls is 
94.7. This group, highly selected on high 
school rank, ranked on the average in the 
upper 5% compared to Minnesota state-wide 
norms on a ninth-grade scholastic aptitude 
test. 

Though the women averaged higher than 
men on high school rank, 94.7 to 91.8, the 
men averaged significantly higher than did 
the women on the Merit Screening test itself, 
57.7 to 49.8. Men averaged higher than did 
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the women on all the tests studied except the 
Cooperative English Tests given at the ninth 
grade and again at the eleventh grade, and 
the Clerical part of the Differential Aptitude 
Tests (DAT). The women scored higher on 
these tests. On the Numerical Ability part 
of the DAT, the men’s and women’s averages 
were equal. 

Table 2 shows the correlation coefficients 
between scores on the National Merit Schol- 
arship Screening Test and the scores on the 
various state-wide program tests. 

In Table 2 note that of those tests adminis- 
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tered at the ninth-grade level, the ACE, the 
Cooperative English Test, and the Verbal 
Reasoning section of the Differential Apti- 
tude Test Battery yield the highest correla- 
tions. Unfortunately, we had no sample of 
students who had taken the Iowa Tests of 
Educational Development (ITED) as ninth 
graders, for ITED composite scores obtained 
on later grades show a very substantial rela- 
tionship to Screening test scores. In making 
comparison among these tests, one must re- 
call that the ITED involves approximately 
eight hours of administration time whereas 


Table 1 
Means and Standard Deviations of Minnesota State-Wide Program Tests of Those Persons 


Taking the National Merit Scholarship Screening Examination in 1955 


Men Combined Sexes 


Women 


Tests 
Taken as 


Fr. 
Fr. 
Fr. 
Fr. 
Fr. 


Tests* N i S N xX S N X 
. ACE 
. Eng. 
. Coop. Math. 
. Coop. Science 


SD 
91.7 14.8 15.0 
259.5 49.7 52.3 
86.8 16.2 16.5 
83.4 18.4 20.4 
96.5 16.8 18.3 


315 
185 
151 
155 
151 


304 


. Coop. Soc. Sci. 


. DAT-Verbal 

. DAT-Numerical 
. DAT-Abstract 

. DAT-Spatial 

. DAT-Mechanical 
. DAT-Clerical 


116 29.4 
27.0 
34.7 
54.0 
32.9 


62.6 


7.9 
6.7 
7.2 
18.0 
9.3 
10.9 


7.6 
6.1 
6.9 


Fr. 
Fr. 
Fr. 
Fr. 
Fr. 
Fr. 
. ITED 
3. ITED 
. ITED 


21.7 
25.5 
26.6 


3.2 
3.5 
4.4 


he. 
Sr. 
oY. 
Jr. 
Jr 


Sr. 


15. ACE 
. Eng. 
. HSR 


127.5 
183.9 
94.7 


. NMST 49.8 


* Test identification code: 


\merican Council on Education Psychological Examination, 1947 High School Edition 
Cooperative English Test, Form Y, lower level, single booklet edition, total score ‘ 
. Cooperative Mathematics Test, Form Y, for Grades 7, 8, and 9 
Cooperative Science Test, Form Y, for Grades 7, 8, and 9 
Cooperative Social Science Test, Form Y, for Grades 7, 8, and 9 
Differential Aptitude Test (Form A), Verbal Reasoning 
Differential Aptitude Test (Form A), Numerical Ability 
Differential Aptitude Test (Form A), Abstract Reasoning 
Differential Aptitude Test (Form A), Spatial Relations 
Differential Aptitude Test (Form A), Mechanical Reasoning 
Differential Aptitude Test (Form A), Clerical Speed and Accuracy 
lowa Tests of Educational Development, Composite Standard Score 
lowa Tests of Educational Development, Composite Standard Score 
lowa Tests of Educational Development, Composite Standard Score 
American Council on Education Psychological Examination, 1952 College Edition 
Cooperative English Test, Form Z, lower level, total score (Effectiveness and Mechanics of Expression) 
. High School Rank (rank in junior class based on cumulative high school scholastic record through the eleventh grade 
National Merit Scholarship (Initial Screening) Test Score. 
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Table 2 


Correlation Coefficients of Scores on State-Wide Program Tests with Scores on 
National Merit Scholarship Screening Examination 


Women Combined Sexes 





Correlation 
Test* N Coefficient 


. ACE 178 .682 
. Eng. 119 .698 


. Coop. Math. 93 617 
. Coop. Science 98 558 
5. Coop. Soc. Sci. 91 641 


. DAT-Verbal 79 .704 
. DAT-—Numerical 80 350 
. DAT-Abstract 79 437 
. DAT-Spatial 79 462 
. DAT-—Mechanical 79 501 
. DAT-Clerical 66 .160 
. ITED 87 .742 
. ITED 41 .822 
. ITED 24 .604 


15. ACE 641 .682 
. Eng. 639 
. HSR 643 .268 


* See identification code for Table 1. 


the other tests require only from 25 to 80 
minutes. The Verbal Reasoning test of the 
Differential Aptitude Test Battery appears to 
do an excellent job of predicting, considering 
that it is a 30-minute test. 

As one would expect, the college edition of 
the ACE Psychological Examination which 
was given during the junior year correlated 
quite substantially with the screening test 
taken a year later. Surprisingly, the Co- 
operative English Test given during the junior 
year did not correlate as well as the ninth- 
grade test, although this may be due to the 
incidental selection factor. Since it is an 
achievement test and the students were 
largely selected on the basis of high school 
performance for eligibility to take the screen- 
ing test, there probably was a restriction in 
range on the English test which could help 
account for the lower correlation. However, 
all test distributions probably are curtailed 
by the selection factor. It is obvious that 
‘high school rank being an explicit selection 


Tests 
Taken as 


Correlation 
Coefficient 


Correlation 
Coefficient N 


651 493 654 Fr. 
.660 304 .580 Fr. 


244 .609 Fr. 
633 Fr. 
242 .675 Fr. 


.675 Fr. 
196 496 Fr. 
194 457 Fr. 
Fr. 
Fr. 
Fr. 

Soph. 
106 i Je. 
62 : Sr. 


1500 Jr. 
1498 j Jr. 
1511 Jr. 


factor accounts for the low relationship be- 
tween high school rank and score on the 
screening test. 

This study has demonstrated that there is 
a considerable amount of predictive validity 
in the nationally available tests used in the 
Minnesota State-Wide Testing Program for 
predicting standing on the National Merit 
Scholarship Screening Test. There appears 
to be enough predictive validity so that these 
instruments can be used as early as the ninth- 
grade level, as well as late, to identify tal- 
ented students in terms of their later posi- 
tion on the National Merit Scholarship 
Screening Test. These students and their 
parents can then be urged to consider higher 
education as one alternative in planning edu- 
cational and vocational futures. 
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SEX AS A DETERMINANT OF DRIVING SKILLS: 
WOMEN DRIVERS! 


LEONARD UHR 


University of Michigan 


The present experiment is an interesting 
(and curious) example of simple, yet con- 
trolled, statistical testing in a complex real- 
life situation. 

Research to determine correlates of driving 
skills has been relatively unsuccessful (Con- 
ger, Gaskill, Glad, Rainey, Sanrey, & Turrel, 
1957; Granier, 1954; Lauer, 1955; Miller, 
1955). This may be due to the driver’s 
ability to compensate, in the usual situation, 
for his deficiencies. 

The advent of the motor scooter (an un- 
common, unprepossessing, but fast machine), 
with a special law licensing 14-year-olds to 
drive vehicles of less than 5 horsepower, con- 
fronted the Michigan auto driver with an 
unusual and stressful but (to the auto) rela- 
tively safe situation, in which novel decisions 
were needed as to speed and size of the 
oncoming vehicle. A controlled and “blind” 
experimental test of the effects of this con- 
frontation was set up and run as follows. 

An auto was first judged to be behaving 
dangerously toward a motor scooter (by cut- 


Table 1 
Incidents of Dangerous and Safe Auto Driving, by Sex 
Auto Driver’s Behavior 


Auto Driver’s 
Sex Dangerous 


Safe 


Male 6 
Female 19 


Total 25 


e.— x? = 20.8 (significant beyond the 0.00001 level); 
.54 (C = .71 represents a perfect Contingency Coefficient 
> 


No 
* = ( 
X 2 table). 


( 
fora 


ting across or into the scooter’s path from a 
stop street or alley so that the scooter driver 
was forced to brake or swerve his vehicle). 
Only after this judgment was made, the sex 
of the auto driver was determined. If sex 
could be ascertained at the time the judgment 
was made, the incident was not counted. 
Twenty-five such incidents were accumulated, 
along with 25 comparison incidents, the first 
matching situation subsequently observed in 
which the auto driver was judged to be acting 
safely toward a scooter (by remaining stopped 
until the scooter passed), and only then iden- 
tified as to sex. 
The results may be seen in Table 1. 


Summary 


An auto driver’s behavior was judged either 
dangerous or safe in an unusual, stressful, 
but relatively safe situation. This behavior 
was found to be related to the driver’s sex 
at the 0.00001 level of confidence. 
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USE OF THE GENERAL APTITUDE TEST BATTERY 
TO DETERMINE APTITUDE CHANGES WITH 
AGE AND TO PREDICT JOB 
PERFORMANCE’ 


MICHAEL HIRT 
Walter Reed Army Medical Center 


Census data (U. S. Department of Com- 
merce, Bureau of the Census, 1952) since 
the turn of the century have indicated a con- 
sistent increase in the number of older people 
in our population. This growth has been both 
relative and absolute, and 1955 data indicate 
that approximately 10% of our population 
is over 65 years old and one third is over 45 
years old. The consequences of such popula- 
tion changes have serious social, economic, 
and psychological implications. 

Many of the technological changes which 
have increased man’s life span have simul- 
taneously brought about such _ industrial 
changes as to shorten and jeopardize the span 
of working years permitted to the majority 
of our employed population. In view of 
our society’s implicit philosophy that man’s 
usefulness ceases when he stops working, the 
employment problems encountered by our 
increasing aged population are of paramount 
concern. 

The United States Employment Service 
(USES) is the federal agency whose primary 
responsibility is the procurement of job op- 
portunities for all desiring to work. The 
results of numerous studies (U. S. Depart- 
ment of Labor, Bureau of Employment Secu- 
rity: 1950a, 1950b, 1956; U. S. Department 
of Labor, Bureau of Labor Statistics, 1946) 
conducted by this agency have indicated con- 
clusively that men and women, upon reaching 
their 35th birthday, may expect to encounter 
excessive difficulties in securing employment. 
Although the age at which an applicant is 
considered old varies among companies, the 
vast majority of our working population is 
dis*riminated against by the time they are 
45 years old. 

1 This study is based upon a dissertation submitted 
to the graduate school at the University of Nebraska, 


in partial fulfillment of the requirements for the 
degree of Doctor of Philosophy. 


In assessing their applicants, the USES 
relies heavily upon the General Aptitude Test 
Battery (GATB). This is a multifactor test 
(Dvorak, 1956) which yields scores in the 
following nine areas: Intelligence (G), Verbal 
Aptitude (V), Numerical Aptitude (N), Spa- 
tial Aptitude (S), Form Perception (P), 
Clerical Perception (Q), Motor Coordination 
(K), Finger Dexterity (F), Manual Dex- 
terity (M). Since the GATB is of such 
consequence in the employment of USES ap- 
plicants, extensive analysis of this instrument 
is justified. Specifically, the purpose of this 
study was to yield data on the following: 


1. Is the relationship between age and apti- 
tudes in the form of a straight line or a 
curve? 


2. What is the relationship between these 
aptitudes and job performance? 

3. Which combination of aptitudes and age 
can best explain the variation in job per- 
formance? 


Method 


The sample used in the study was selected from a 
“population” of approximately 1500 Ss. The ages of 
the Ss ranged from 19 to 83 years; their educational 
level ranged from 6 to 14 years. This population 
represented 16 occupations which, in terms of the 
Dictionary of Occupational Titles structure, were 
distributed within the 8 and 9 groups. In other 
words, all these occupations are of an unskilled or 
semi-skilled nature. All the Ss were experienced 
workers and were evaluated with a descriptive rating 
scale developed by the USES in accordance with 
their procedure for developing test batteries for 
employee selection (U. S. Department of Labor, 
Bureau of Employment Security, 1952). 

The population was divided into four age groups: 
25 to 34, 35 to 44, 45 to 54, and 55 and older. 
One hundred Ss were randomly selected from each 
of these age groups, yielding a total sample of 
400 Ss. 

Each of the aptitudes as well as supervisory ratings 
were plotted against age. The test for nonlinearity 
which was used consists of “comparing the sum of 
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squares for linear regression with the sum of squares 
for quadratic regression” (Wert, Neidt, & Ahmann, 
1954). The results of this analysis are summarized 
in Table 1. 

The determination of the most efficient combina- 
tion of aptitudes and age to predict the criterion 
(supervisory ratings) was based on the analysis of 
multiple regression. This multiple regression con- 
sisted only of those aptitudes which, when consid_red 
singly, correlated significantly with the criterion. 


Results 


The statistical significance of the F test 
associated with the advantage of the quad- 
ratic over the linear regression indicates 
whether the particular aptitude is related to 
age in a linear or nonlinear manner. The 
index of correlation (R,) indicates the cor- 
relation resulting from the curve, and the 
zero order correlation (r,,) indicates the rela- 
tionship between each aptitude and age. The 
significance of R, is indicated by the F test 
associated with the quadratic equation. 

It can be seen from Table 1 that Aptitude 
G, V, N, and S are related to age in a 
curvilinear relationship. The index of cor- 
relation for each of the aptitudes as well as 
the criterion is statistically significant; all the 
zero order correlations between the aptitudes 
as well as the criterion with age are signifi- 
cant and negative, with the exception of 
Aptitudes K and M, which are related in a 
positive and significant manner. 


Table 1 
Advantage of the Quadratic over Linear Regression 
for the General Aptitude Test Battery 
and the Criterion 


F Value 


Aptitudes R; — Quadratic Advantage 


266 
.266 
.232 
592 
487 
372 
266 
370 
309 
191 


— 
.245** 
aa" 
—.560** 
487** 
sii™ 
Dg 
aa 
.303** 
189* 


15.106** 
15.164** 
11.298** 
107.358** 
61.791** 557 
31.884** .667 
15.120** 761 
31.577" 1.070 
20.962** 1.503 
11.373" 423 


10.035** 
4.669* 
3.664* 

27.799** 


Criterion 


* Significant at the .05 level. 
** Significant at the .01 level. 


Table 2 
Changes in Aptitudes G, V, N, and S at Various Ages 


Aptitudes 


Age i 2 N S 
20 90.645 
25 . 91.792 
30 337 92.299 
31 575 92.324 
32 778 92.324 
33 93 92.297 

92.168 
37 A 91.396 
40 91,396 
45 88: 89.986 
50 5. 87.937 
55 i 85.248 
60 81.920 

73.345 


82.688 
83.940 
84.524 
84.560 
84.570 
84.554 
84.440 
84.219 
83.688 
82.269 
80.182 
77.427 
74.005 


65.157 


90.216 
90.722 
90.793 
90.756 
90.700 
90.627 
90.429 
90.161 
89.629 
88.394 
86.724 
84.618 
82.077 
75.689 


Table 2 indicates the changes which occur 
in Aptitudes G, V, N, and S at various ages. 

It can be seen from Table 2 that these four 
aptitudes reach their peak at ages 37, 31, 
32, and 30, respectively, and then begin to 
decline. 

When the zero order correlations between 
the criterion and the aptitudes as well as age 
were computed, it was found that Aptitudes 
P, Q, K, M, and age correlated at the .01 
level with the criterion. Table 3 indicates 


Table 3 


Analysis of Multiple Regression of the Criterion 


Source of Variation R, F 


Five variable regression 337 
(Aptitudes P,Q, K, M. 
and age) 


8.970* 


Four variable regression 
(Aptitudes Q, K, M, 
and age) 

Three variable regression 
(Aptitudes K, M, and 
age) 

Two variable regression 
(Aptitude M and age) 

Two variable regression 
(Aptitude K and age) 


* Significant at the .01 level. 
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the results of the analysis of multiple regres- 
sion between these five variables and the 
criterion. 

It can be seen from Table 3 that only the 
elimination of one of the variables, Aptitude 
K, results in a significantly lower multiple 
correlation. In other words, this is the only 
variable which contributes significantly in the 
prediction of the criterion. 


Discussion 


Several significant points emerge from the 
results of this study. Outstanding among 
these is the curvilinear relationship between 
age and Aptitudes G, V, N, and S. The con- 
sequences of such a relationship are not ex- 
plicity clear. In general, when a given pre- 
dictor (age in this particular case) correlates 
negatively with the other valid predictors and 
zero with the criterion, that variable con- 
tributes unwanted variance and thereby de- 
tracts from the efficiency of the prediction 
battery. 

In this particular case, since only Aptitude 
K contributes significantly to explaining the 
variance in the criterion, and this is one of 
the two variables which is not correlated with 
age in a negative manner, age does not seem 
to penalize the work evaluation scores (cri- 
terion) of this particular sample. 

There are at least two considerations which 
are necessary in interpreting the results of 
this study. In the first place, the criterion 
may be inadequate. This is a risk inherent 
in practically all applied research. In the 
second place, and this seems more reasonable, 
the results may reflect the sample used. It 
will be recalled that the occupations repre- 
sented were all in the unskilled level; it is 
quite likely that for such occupations motor 
coordination is indeed the major or only apti- 
tude measured by the GATB which is needed 
for successful job performance. This possi- 
bility should be considered in interpreting the 
results from the analysis of multiple regres- 
sion. With regard to the curvilinear relation- 
ship between age and four of the aptitudes 
measured by the GATB, this finding alone 
justifies considering age as a relevant variable 
when using the GATB to predict job per- 
formance. 


If, in repetitions of this study with samples 
drawn from different occupational groups and 
with the development of more adequate cri- 
teria, it can be demonstrated that age cor- 
relates zero with the criterion and negatively 
with the valid predictors, a correction factor 
will have to be introduced to compensate for 
age detriments. 


Summary 


This: study attempted to answer the fol- 
lowing questions: . 


1. Is the relationship between age and ap- 
titudes as measured by the GATB in the form 
of a straight line or a curve? 

2. What is the relationship between these 
aptitudes and job performance? 

3. Which combination of aptitudes and age 
can best predict the variance in the criterion? 


The sample used consisted of 400 Ss 
equally divided into age groups ranging from 
25 to 34, 35 to 44, 45 to 54, and 55 and older. 
The GATB scores as well as job performance 
evaluations were related to age by means of 
nonlinear regression. It was found that Apti- 
tudes G, V, N, and S were related to age 
in a curvilinear manner, reaching their peak 
at ages 37, 31, 32 and 30, respectively, and 
then beginning to decline. 

When the best prediction scheme of the 
criterion was sought, it was found that only 
Aptitude K contributed significantly to pre- 
dicting the variance in the criterion. The 
possibility was suggested that this may be 
an artifact of the sample used. 
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PREDICTION OF AN ULTIMATE CRITERION 
OF SUCCESS AS A LAWYER' 


M. K. DISTEFANO, JR. anp BERNARD M. BASS 


Louisiana State University 


The Law School Admission Test (LSAT), 
constructed and administered by Educational 
Testing Service, is designed to predict scholas- 
tic achievement in law school. Performance 
on this test is said to depend upon the “ability 
to read, to understand, and to reason logically 
with a variety of verbal, quantitative, and 
symbolic materials’ (Educational Testing 
Service, 1954a, p. 3). The test-retest re- 
liabilities of the LSAT (short form) range 
from .67 to .87, with a median r of .77 (John- 
son, Olsen, & Winterbottom, 1955). 

The need for a predictor other than prelaw 
grades or in addition to grades is evidenced 
by the inconsistency of grading practices in 
different colleges and departments (Bass, 
1951). The Educational Testing Service in- 
vestigated the effectiveness of the LSAT in 
combination with prelaw grades in predicting 
academic success (Johnson, 1954; Johnson et 
al., 1955; Educational Testing Service, Va- 
lidity Studies Section, 1954). In these stud- 
ies both the long and short forms were used. 
The latter was found to predict as well as the 
former, if not better. In general the findings 
showed the LSAT to be a better predictor 
than prelaw grades, but that both combined 
were better predictors than either considered 
individually.* Application of these findings 
has resulted in an appreciable reduction in the 
percentage of law school failures (Johnson 
et al., 1955). 

1This paper is based on a thesis submitted to 
Louisiana State University by the senior author in 
partial fulfillment of the requirements for the M.A. 
degree in psychology. 

2 Now at the State Colony and Training School, 
Pineville, Louisiana. 

3 The 16 Ss of this study were drawn from among 
404 examinees who had finished at least one year of 
work in the LSU law school during a seven-year 
period. For these 404 examinees, the LSAT corre- 
lated .45 with average law grades earned and 34 
(point-biseral) with tendency to graduate. Prelaw 
grades correlated 39 and .25 with low grades and 
tendency to graduate. Multiple correlations with the 
immediate and intermediate criteria were .49 and .41, 


respectively. Results were thus consistent with the 
findings of ETS. 
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The present study attempted to predict an 
ultimate criterion of success as a lawyer with 
the LSAT and prelaw grades. 


Ultimate Criterion of Success 
as a Lawyer 


The ultimate criterion was the demon- 
strated legal ability of examinees after five 
years out of law school as rated by court 
judges living in the area in which the lawyers 
practiced. 

Preliminary interviews with lawyers, both 
academic and nonacademic, agreed with the 
ETS conclusion that the ultimate criteria of 
success as a lawyer were fuzzy. Yet, some 
degree of concensus was found in the evalua- 
tion of specific individuals by judges. The 
sample of 17 practicing lawyers came from 
two urban areas in the state; they had been 
graduated from the LSU law school in 1948 
through 1950. They had been administered 
the LSAT on entrance to law school. In the 
first area, three district judges were asked to 
rate 10 lawyers in their district on the basis 
of their impressions of the lawyer's legal 
ability. They were instructed to rate them 
on a five-point continuum, with a rating of 
1 being “very low in legal ability” and 5 
being “very high in legal ability.” They were 
further told to specify if a lawyer was “not 
known” to them. (This eliminated one S.) 

In the second area, seven practicing lawyers 
were rated by two district judges. The agree- 
ment among the three judges in the first 
urban group yielded interrater reliabilities of 
82, .82, and .97. However, the correlation 
between the two raters in the second urban 
area was only .42. 


Results and Conclusions 


The bimodality and nonsymmetry of the 
criterion ratings suggested the use of a non- 
parametric analysis. The first urban sample 
appeared to separate naturally into five 
“highs” in rated legal ability and four “lows.” 
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The second bimodal sample separated natu- 
rally into five “highs” and two “lows.” These 
16 lawyers were ranked on the LSAT scores 
they had earned upon admission to law 
school from first to sixteenth. The 10 “highs” 
according to judges’ ratings had a mean rank 
of 7.8 on LSAT scores. The 6 “lows” had a 
mean rank of 11.6. According to White’s 
test (Edwards, 1954), the difference was sig- 
nificant at the 5% level (T’ = 35). Corre- 
sponding mean ranks of the rated “highs’ 
and “lows” on prelaw grades were 6.9 and 
13.4. Again, the difference was significant 
at the 5% level (7’ = 44). A ranking based 
on an optimum weighting of LSAT scores and 
prelaw grades (optimum in forecasting law 
school grades) yielded a mean rank of 7.2 
for the “highs” and 12.8 for the “lows” with 
a T’ of 38 significant at the 5% level. 


Summary 


District court judges’ ratings of 16 lawyers 
practicing in their areas for five years pro- 
vided two ultimate criterion groups. The 
highly judged lawyers were found to have 
scored significantly higher on the Law School 
Admission Test taken upon entrance to law 
school. Their prelaw grades were also found 
to have been significantly higher. 

These results were obtained despite the fact 
that the test was constructed without refer- 


ence to postschool law success; and despite 
the ambiguities in evaluating ultimate suc- 
cess as a lawyer, and the restrictions in size 
and range of the sample due to attrition in 
law school and change of career following 
graduation. 


Received April 24, 1958. 
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According to Hospitals (Anonymous, 1956) 
there are over 400,000 subprofessional hos- 
pital care employees in the United States. 
Groups of applicants for these subprofessional 
positions tend to contain a large proportion 
of individuals of low caliber. Consequently, 
some means is needed whereby applicants 
who are suitable for this type of work can 
be selected. 

Only a few studies of methods for selecting 
and evaluating these necessary personnel are 
reported in the literature; for the most part, 
those that are reported concern the selection 
of psychiatric aides. The evidence presented 
by Levine (1951), Barron and Donohue 
(1951), and Yerbury, Holzburg, and Allessi 
(1951) indicates that aptitude or ability 
tests may contribute in some degree to the 
selection of subprofessional hospital care 
personnel. 

Studies using personality measures for se- 
lection report varying degrees of success in 
predicting job performance. Kline (1950), 
Yerbury, Holzburg, and Allessi (1951), and 
Love (1955) report at least some degree of 
success using personality measures of various 
types while Levine (1951) and Caudra and 
Reed (1957) report negative findings. 

It is clear that studies are needed which 
will furnish more definitive evidence on the 
validity of aptitude or ability tests and per- 
sonality tests in the selection of subprofes- 
sional hospital care personnel, especially kinds 
of personnel other than psychiatric aides. 


Subjects 


The subject group consisted of 150 incum- 
bents at two U. S. Public Health Service 


hospitals. The distribution of the group by 
grade and specialty is given in Table 1. 


1The authors wish to express their appreciation 
to Milton Epstein, formerly with the Public Health 
Service, now at Davis Memorial Goodwill Indus- 
tries, for assistance in the preparation of the tests 
and collection of data. 


Nursing Assistants at Grades 1 and 2 are 
mainly concerned with sanitary and custodial 
care of patients while those at Grades 3 and 4 
are more nearly like, and often actually are, 
registered practical nurses. Employees in the 
Medicine and Surgery specialty are concerned 
with care of patients in medical and surgical 
wards; those in the Psychiatric group are 
psychiatric attendants; those in the Operating 
Room specialty perform duties involving 
preparation of patients for surgery, care of 
operating room equipment, and scrub nurse 
functions. 


Method 


Tests and administration. The nine tests 
whose names are given in Table 2 were se- 
lected by means of Primoff’s J Scale tech- 
nique and special forms of the tests were 
developed (Primoff, 1957).*_ The administra- 
tion of the tests, which was carried out during 
regular working hours, was found to be diffi- 
cult due to the low ability level of the Ss and 
their unfamiliarity with objective tests. A 
number of zero scores were found on all of 
the tests. 

Criterion measure. The criterion used was 
a set of 18 12-point scales specifically de- 
signed to measure the important aspects of 
performance of the Nursing Assistant.’ These 
scales are listed in Table 2. Each scale had 
four behaviorally defined levels of perform- 
ance with three points within each level. The 
criterion score of an individual on each scale 
was the average of the confidential ratings 
given him by five professional nurses. 


2 The work of Ernest Primoff and other members 
of the Test Development Section, U. S. Civil Service 
Commission, in the selection and development of the 
tests is gratefully acknowledged. A Federal inter- 
agency committee cooperated in the test develop- 
ment project. 

8 We wish to acknowledge the aid of Senior Nurse 
Officer Mary Jenney, who collaborated in the devel- 
opment of the tests, did the professional nurse work 
essential to the development of the rating scales, and 
made other contributions to the study. 
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Table 1 man’s multiple group method of factoring, 
Distribution of Nursing Assistants by all factors being extracted at once and or- 
Specialty and Grade thogonalized (Guttman, 1952). Minor or- 
thogonal rotations were made to clarify the 
GS Grade Level structure. In addition, less extensive analy- 
—— = ses were made of the scores of two more 
Specialty 3 Total homogeneous groups. For one subgroup, con- 
Sith i Ginnie ; 112 sisting of 95 Ss at the GS-1 or 2 level in the 
Psychiatric : 2 | Medicine and Surgery specialty, test validi- 
Operating Room 4 12 ties in predicting Scales 1 and 8 (see Table 
— 2) and the intercorrelations of the tests were 
Total 88 150 computed. For the other subgroup, consist- 
ing of 26 Psychiatric Nursing Assistants, only 

Scale 1 validities were computed. 

Statistical analyses. Three statistical analy- : 

ses were made. In one, all intercorrelations Results and Interpretation 

of scores for the total group on the 27 vari- Intercorrelation matrix. From the com- 
ables were computed and analyzed by Gutt- plete matrix of intercorrelations for the total 


Table 2 


Factor Loadings—Criterion Ratings and Selection Tests 
Criterion Ratings 


Dexterity and Adaptability in Handling Equipment 

Awareness of Patients’ Needs 

Recognizing Safety Hazards and Adherence to 
Aseptic Technique 

Accuracy in Carrying out Procedures 

Accuracy in Making and Reporting Observations 

Organization of Work 

Checking of Inventories 

General Fitness as a Nursing Assistant 

Emotional Control 

Accepting Changes in Assignment 

Acceptance of Criticism 

Relationships with Co-workers 

Discretion Exercised When Speaking 

Dependability in Carrying out Assignments 

Standards of Cleanliness and Order 

Adherence to Hospital Policies and Regulations 

Attendance and Promptness 

Appearance 


Tests 


co 


Gross Dexterity 


—_) 


Background for Handling Nursing Equipment* 
Nursing Perception 

Number Checking 

Coding Memory 

Verbal Ability 

Fine Dexterity 

Arithmetical Ability 

Ability to Follow Oral Directions 


mM NN Nw WK NN LY 
oe) a ne ae 


~ 


* A test of mechanical comprehension using hospital equipment in the items. 
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group is was seen that the degree of relation- 
ship between test scores and criterion scales 
was rather small, but that some of the scales 
correlated more highly with the tests than 
others.* All the scales intercorrelated more 
highly among themselves than they inter- 
correlated with the tests, and all the tests 
intercorrelated highly. 

Factor analysis. As can be seen from the 
results of the factor analysis in Table 2, 
there are two clearly defined factors, per- 
formance ratings (Factor I) and ability tests 
(Factor II), and a fairly definite third, the 
factor common to both the ratings and tests 
(Factor III). All the tests have moderate 
and approximately equal loadings on Factor 
III, as do about half of the criterion scales, 
while the other half of the scales have zero 
or negative loadings. 

The relatively high intercorrelations among 
the tests and their high loadings on Factor 
II, the ability test factor, indicate that the 
tests are not measuring in any great degree 
the specific abilities they were designed to 
measure, but are rather measuring a general 
ability for the most part. 

The differentiation between the predictable 
and unpredictable criterion scales does not 
make for a completely clear-cut interpreta- 
tion of Factor III, but the scales with high 
loadings seem to be those which deal with 
the Nursing Assistant’s ability to perform 
tasks while the low loadings are those of 
scales relating to aspects of the job more com- 
pletely dependent on personality and social 
factors. The highest loading on Factor III 
is that of Scale 1, Dexterity and Adaptability. 
The other scales with loadings over .40 on 
this factor, Scales 2-7, also have to do pri- 
marily with task performance aspects of the 
job. On the other hand, Scales 10-18, such 
as Acceptance of Criticism, Relationships with 
Co-Workers, and Adherence to Hospital Poli- 
cies and Regulations, which reflect person- 


*The complete matrix of intercorrelations for the 
total group has beer deposited as Document number 
5807 with the ADI Auxiliary Publications Project, 


Photoduplication Service; Library of Congress, 
Washington 25, D. C. Copits may be secured by 
citing the Document number and by remitting $1.25 
for photoprints, or $1.25 for 35 mm. microfilm 
Advance payment is required. Make checks or 
money orders payable to: Chief, Photoduplication 
Service, Library of Congress. 


ality, work attitudes, and social behavior, 
have zero or negative loadings. General Fit- 
ness, Scale 8, has a positive loading, but is 
lower than Scales 1-7. The only important 
contradiction to the differentiation of scales 
is provided by Scale 9, Emotional Control, 
which has a moderate positive loading of .29. 

The differential predictability of the rating 
scales is somewhat surprising in view of the 
high correlation observed among all the scales. 
This differential predictability indicates that 
the degree of validity evidenced by predictors 
of the ability type depends to a considerable 
extent on the degree to which the criterion is 
composed of task performance aspects of the 
job. Presumably, success of prediction would 
be greatly enhanced if appropriate measures 
of the relevant personality variables, as re- 
flected in the scales having loadings around 
zero on Factor III, could be devised. 

Validity coefficients. More specific infor- 
mation on the test validities is furnished in 
Table 3. This table gives the validity coeffi- 
cients based on Dexterity and Adaptability, 
Scale 1, the scale which had the highest load- 
ing on Factor III, and on General Fitness, 
Scale 8, for both the total group and the 
Medicine and Surgery subgroup. Only va- 
lidities based on Scale 1 are given for the 
small Psychiatric subgroup. 

Validities for the total group in predicting 
over-all General Fitness, Scale 8, range from 
.11 to .20; the four highest of these coeffi- 
cients are significant at the .05 level. Va- 
lidities for Dexterity and Adaptability, Scale 
1, are somewhat higher, ranging from .19 to 
.30; two of these are significant at the .05 
level, and the rest, at the .01 level. The rela- 
tive predictability of these two scales is con- 
sistent with the results of the factor analysis. 

The validities of the Medicine and Surgery 
subgroup are about the same as those for the 
total group, which is not surprising since this 
category constitutes about 60% of the total. 
In this subgroup as well as in the total group, 
multiple correlational analysis indicated that 
the increase in prediction from the use of 
more than one predictor was of neither prac- 
tical nor statistical significance. 

The test validities in the small Psychiatric 
subgroup ranged from .10 for Arithmetical 
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Table 3 


Validity Coefficients by Subgroup and Performance Rating Scale 


Psychiatric 
Subgroup 


Medicine and Surgery 


Subgroup Total 





Scale 1 
N = 26 


Scale 8 Scale 1 
N=150 N= 150 


Scale8 Scale 1 
N=z=95 N=95 





Gross Dexterity* 

Background for Handling Nursing 
Equipment 

Nursing Perception 

Number Checking 

Coding Memory 

Verbal Ability 

Fine Dexterity* 

Arithmetical Ability 

Ability to Follow Oral Directions 


4” 


.26 
.13 
37 


AM .20* .19* a 


18 
.27* 
27° 
-— 
.22* 
.24* 
.24* 
D> 


14 
ll 
.18* 
BD a 
14 
12 
14 
.20* 


.20* 

.26** 
23"* 
.24** 
a" 
a 
.19* 

a 





P * Scatter plots indicated that the correlations for the Gross and Fine Dexterity tests would have been about the same as shown 
in the table if the zero scores, of which there were an undue number, were eliminated; the validities of the dexterity tests, however, 
may not reflect the degree to which manual dexterity is involved in the job. 

*P < 0S. 


*P < 01. 


Ability to .45 for Gross Dexterity.’ The 
rather high validity for the Gross Dexterity 
test in this group is due primarily to the fact 
that Ss who received zero scores on this test, 
nine out of 26, tended to have low ratings. 
Since the zero scores are almost sure to; be 
due to a failure to understand or follow direc- 
tions rather than a complete lack of manual 
dexterity, the test of Ability to Follow Oral 
Directions probably best represents the abil- 
ity which is related to job performance. Also, 
inspection of the distribution of scores for all 
the tests led to the conclusion that the test of 
Ability to Follow Oral Directions was of the 
most appropriate difficulty level for the Psy- 
chiatric as well as the other groups of Nurs- 
ing Assistants. 

Operational usefulness of selection tests. 
The practical usefulness of the Oral Direc- 
tions test, which is deemed the most appro- 
priate for Nursing Assistants, can be illus- 
trated by dividing the Medicine and Surgery 
group into quintiles and the Psychiatric group 


5 The apparent inconsistency in the significance of 
the validities for the Psychiatric subgroup as given 
in Table 4 (where the correlation of 35 for Oral 
Directions is significant, but .38 for Coding Memory 
is not) is due to the fact that the significance of the 
degree of association was established from a 2 X 2 
median-split contingency table using Fisher’s Exact 
Test. 


at the median on performance as reflected by 
Scale 1 and comparing the test score dis- 
tributions of the upper and lower performing 
groups. In the Medicine and Surgery sub- 
group, 74% of the lowest fifth on the test are 
low performers while only 26% are high per- 
formers. These figures are nearly reversed 
for the highest fifth on the test; 28% of this 
group are low performers while 72% are high 
performers. For the Psychiatric subgroup, 
69% of those above the median on the test 
are high performers and 31% are low, while 
the group below the median on this test is 
composed of 23% high performers and 77% 
low performers. Thus it appears that the av- 
erage level of performance could be signifi- 
cantly raised through the use of this test. 


Summary and Conclusions 


Nine ability tests developed by the United 
States Civil Service Commission as a coopera- 
tive project seem to predict the work perform- 
ance of Nursing Assistants in the U. S. Public 
Health Service to some degree. The tests are 
of use in screening out applicants of too low 
an ability level to adapt successfully to the 
task performance aspects of the work. The 
validity of the predictors appears to be pri- 
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marily due to a general ability rather than 
abilities specific to individual tests. The test 
of Ability to Follow Oral Directions is per- 
haps most appropriate because of the satis- 
factory distribution of scores it gives with a 
group of this ability level and because it prob- 
ably measures best the ability which appears 
to be producing the validity. 

The 18 criterion scales measure a general 
performance factor, Factor I. In addition 
about half of the scales, those measuring the 
task performance aspects of the work, com- 
bine with the tests to constitute Factor III. 
The ability to perform the tasks involved in 
the work, then, is predictable to some degree. 
The aspects of the work represented in those 
scales with low or negative loadings on Fac- 
tor III, which appear to deal with person- 
ality, social, and work attitude characteristics, 
are not predictable by the tests employed in 
the study. To measure those characteristics, 
other tests would have to be constructed. 


Received April 28, 1958. 
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A major requirement in establishing the ac- 
ceptability of foods for military use is that of 
taste-testing. The major purposes of these 
tests are to evaluate samples submitted by 
food processors intending to bid on procure- 
ment contracts, to determine the effects on 
preference of certain processing variables, and 
to assess the degrees of liking for new foods. 

In connection with this service function, a 
considerable amount of criterion and meth- 
odological research on affective evaluations 
has been in progress for several years. For 
the most part, the rationale for the specific 
problems investigated has been based on em- 
pirical and practical considerations, and the 
results have proved useful in improving the 
reliability and interpretability of taste-test 
data. However, it was felt that increasing 


emphasis on theory will lead to greater inte- 
gration of findings and will facilitate applica- 


tions of methodological research. 

A simple model was developed as a starting 
point. The research strategy was to set forth 
tentative and perhaps over-simplified assump- 
tions based largely on observations of the test- 
ing process and subsequent results, to derive 
and test hypotheses, and to revise the model 
accordingly. The broader implications of the 
assumptions and results will be discussed 
later. 


Assumptions 
1. An individual evaluating a given food 


item in terms of like and dislike bases his 


1 This paper reports research undertaken at the 
Quartermaster Food and Container Institute for the 
Armed Forces, and has been assigned Number 843 
in the series of papers approved for publication. The 
views or conclusions contained in this report are 
those of the author. They are not to be construed 
as necessarily reflecting the views or endorsement of 
the Department of Defense. 

2 The writer expresses his appreciation to Norman 
Gutman, Chief, Statistics Branch, and Bettye John- 
son, Statistical Clerk, for their valuable assistance 
Special acknowledgment is made to Phyllis Whitmer 
and Audrey Beauvais, Home Economists, Food Ac- 
ceptance Branch, who performed almost all the labo- 
ratory work. 


judgment on the presence or absence of sev- 
eral characteristics of the food. 

2. Characteristics of foods are of two types: 
negative and positive. For any food, charac- 
teristics of both types are likely to be present. 

3. (a) Noticing the absence of a positive 
characteristic will result in a lower preference 
rating for the food. 

(b) Noticing the presence of a negative 
characteristic will result in a lower preference 
rating for the food. 

4. (a) When the positive characteristics of 
a good quality food (a “good’’) predominate, 
the presence of some of the negative charac- 
teristics is not noticed or taken into consid- 
eration. 

(6) When the negative characteristics of 
a poor quality food (a “poor”) predominate, 
the presence of some of the positive charac- 
teristics is not noticed or taken into considera- 
tion. 

5. (a) Presentation of a “poor” increases 
an individual’s awareness of the presence of 
some of the same negative characteristics in 
a “good.” 

(6) Presentation of a “good” increases 
an individual’s awareness of the absence of 
some of the same positive characteristics in 
a “poor.” 

6. (a) As successive samples of a “poor” 
are served, forgetting the absence of some 
positive characteristics takes place. 

(b) As successive samples of a “good”’ 
are served, forgetting the presence of some 
negative characteristics takes place. 


Experimental Implications 


It can be shown that the above assump- 
tions lead to the following predictions. 

1. A “poor” will be rated lower when pre- 
ceded by a “good” than when it is preceded 
by another “poor.” Consequently, the dif- 
ference in mean preferences between a “good” 
and “poor” should be larger when the ratings 
of the “poor” are obtained after a “good” 
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than after another “poor.” This effect is 
called contrast. 

2. A “good” will be rated lower when pre- 
ceded by a “poor” than when it is preceded 
by another “good.” Thus, the difference in 
mean preferences between a “good” and 
“poor” should be smaller when the ratings of 
the ‘‘good” are obtained after a “poor” than 
after another “good.” This effect is called 
convergence. 

3. Preference will increase with successive 
servings of the same quality, provided no op- 
posite quality intervenes. 


Method 


Four independent replications of the experi- 
ment were conducted on four separate days 
between August, 1957, and January, 1958. 


Foods 


The food tested in the first two replications was a 
cherry beverage made from a concentrated liquid 
base. The food tested in the second two replications 
was a beef broth prepared from a granulated base. 
On the basis of results from previous taste tests, 
“good quality” lots of each food were selected. 
“Poor quality” samples of each food were prepared 
as follows: For the cherry beverage on the first rep- 


Table 1 


Orders of Presentation and Qualities Presented 
for Four Treatments 


Order of Presentation 


First 


Second Third 


Treatment Fourth 


Good 


Poor 


Poor 
Good 
Poor 


Good 


Poor 
Good 
Poor 


Good 
Good Poor 


lication, 44 ml. of vinegar and .55 g. of caffeine were 
added to the standard ingredients of 570 g. of sugar, 
4500 ml. of water, and 95 ml. of beverage base. On 
the second replication, adulterating ingredients were 
13.4 g. “liquid smoke” and 10.8 ml. vinegar. For 
the beef broth on both replications, the powder was 
partially burned, thereby producing a definite acrid 
taste. Since the burning could not be precisely con- 
trolled, the tastes of the “poors’’ on the two repli- 
cations were not identical. The holding and serving 
temperatures of the beverage was 50° F., and of the 
broth, 100° F. 


Judges 


The judges (Os) were randomly drawn from a 
larger pool of about 700 civilian and military, both 


Table 2 





Treatment Replication 


1 


3.6 
3:7 
3.8 
6.0 


» A high rating signifies a high preference. 
b“G" & “P" represent “‘Good" and ‘‘Poor,’’ respectively 


Second Fourth 


3.4 (P) 3.9 (P) 
3.0 . 2.5 
3.4 “ 3.2 
3.5 : 4.2 


4.2 (P) 5. 6.9 
6.3 a. 6.4 
5.2 5. 7.0 
6.1 5. 5.5 


3.3 (P) 
1.9 

15 4.1 

6.8 3.2 


3.3 (P) 6.5 
3.8 6.8 
2.9 6.8 
4.2 6.1 





Contrast and Convergence in Ratings of Foods 


Table 3 


Analyses of Variance of Preference Ratings 


Source of 
Variation 


. Poor, 2nd 
(Treatments 1 vs. 2) 
. Good, 2nd 
(Treatments 3 vs. 4)» 
. Poor, 1st & 3rd 
(Treatments 2 vs. 4)” 
. Good, Ist & 3rd 
(Treatments 1 vs. 3)” 
. Ist & 2nd 
vs. 3rd & 4th 
. Ast & 3rd 
vs. 2nd & 4th 
. [st & 4th 
vs. 2nd & 3rd 
. Good vs. poor 
. Good, 4th 
(Treatments 2 vs. 4) 
. Poor, 4th 
(Treatments 1 vs. 3)» 
. Good vs. poor, 
Ist & 2nd. vs. 
3rd & 4th 
. Good vs. poor, 
Ist & 3rd vs. 
2nd & 4th 
. Good vs. poor, 
Ist & 4th vs. 
2nd & 3rd 
. Poor, ist & 3rd; 
ist & 4th vs. 
2nd & 3rd 
. Good, Ist & 3rd; 
Ist & 4th vs. 
2nd & 3rd 
. Judge 
(within groups) 36 
. Judge-treatment 
(within groups) 108 


. Total 159 


* Ordinal values refer to position of 


First 
Replication 
(Cherry Beverage) 
Mean 
Square 


3.2000 


3.2000 


7.2250 


3.0250 


.9000 


3.6000 


.9000 


366.0250 


.8000 


1.8000 


3.0250 


9.0250 


.0250 


6.1361 


1.4546 


sample. 


> Evaluated against judge (within groups). 


error term 
© One-tailed test. 
4 Two-tailed test. 


Mean 
p Square 


54.4500 
.0500 
30.6250 
.6250 
8.1000 
22.5000 


32.4000 
<.01° 324.9000 
.8000 


1.8000 


28.9000 


<.01¢ 9.0250 


1.2250 
4.7903 


1.5106 


Second 
Replication 
(Cherry Beverage) 


p 


Replication 
(Beef Broth) 


Mean 
Square 


16.2000 
1.2500 
24.0250 
4000 
8.1000 
2.5000 


3.0250 
462.4000 


.2000 


5.0000 


5.6250 


4000 
4.9292 


1.6495 


Fourth 
Replication 
(Beef Broth) 


Mean 
p Square p 


<.05¢ 33.8000 <.01° 
-8000 
14.4000 
.9000 
40.0000 
40.0000 


0 
96.1000 


1.8000 


5.000 
1.6000 


8.1000 


1.6000 
5.1472 


1.8306 


In all other comparisons, judge-treatment (within groups) interaction is the 
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male and female, employees who regularly partici- 
pate in taste tests. Departures from randomness oc- 
curred when some were absent or were otherwise 
not available on the days the tests were conducted. 
Separate selections of 40 Os were made for each of 
the four replications. 


Procedure 


Each group of 40 Os was randomly assigned to 
one of four treatments that differed in the order in 
which the different qualities were presented and the 
number of each quality rated. The nature of the 
treatments is summarized in Table 1. 

Each O sat in a semienclosed testing booth. Two- 
ounce samples in coded cups or glasses were pre- 
sented one at a time through a turntable in a wall 
separating the booth from the kitchen. O drank as 
much or as little as he wanted of each sample, rated 
the product on a nine-point scale described elsewhere 
(Peryam & Pilgrim, 1957), and rinsed his mouth 
ad libitum with charcoal-filtered distilled water. The 
time between the rating of one sample and the pres- 
entation of the next was 45 seconds. 

On the first replication, the O’s, after rating the 
third and fourth samples, were asked to list the posi- 
tive and negative characteristics of each sample. 
Their lack of ability to so verbalize led to the de- 
cision to discontinue these questions on the later 
replications. 


Results 


The preference means of each treatment on 
each replication are given in Table 2. An 


analysis of variance of the preference ratings 
was performed for each replication separately, 
and the results of these analyses are presented 


in Table 3. In each case the ratings of the 
“poors” were clearly lower than the ratings of 
the “goods” (Source of variation No. 8). 


Contrast Effects 


Inspection of Table 2 reveals that in every 
replication, the average rating of “poor” in 
the second position was lower when it was 
preceded by a “good” than when it was pre- 
ceded by another “poor” (Source of variation 
No. 1). Three of the four differences were 
significant at either the .05 or .01 level. Since 
the combined probability (Wilkinson, 1951) 
is less than .001, it is concluded that contrast 
effects have been demonstrated. 

Convergence Effects 

Tables 2 and 3 fail to show any consistent 
or significant (Source of variation No. 2) 
difference in the ratings of the “goods” re- 
gardless of whether a “poor” or “good” pre- 


ceded. The prediction regarding convergence 
effects was not confirmed. 


Effect of Successive Presentations 


It was predicted that as the “poors” are 
successively presented, the ratings would in- 
crease, but would not increase if a “good” 
intervenes. This means that the algebraic 
difference between the third and first samples 
should be greater for those in Treatment 2 
than for those in Treatment 4. The compari- 
son of Source of variation No. 14 with the 
judge-treatment interaction constitutes the 
appropriate test of this prediction. Signifi- 
cant differences at the .05 or .01 levels were 
attained for three replications; for the re- 
maining replication, the results were in the 
expected direction though not significant. 
The combined probability is less than .001. 
The hypothesis may be considered confirmed. 

It was also predicted that as the “goods” 
are successively served, the ratings would in- 
crease, but would not increase if a “poor’”’ in- 
tervenes. Similar to the preceding predic- 
tion, the algebraic difference between the 
third and first samples should be greater for 
Treatment 3 than for Treatment 1. How- 
ever, when Source of Variation No. 15 was 
tested against the judge-treatment interaction, 
no significant effects emerged. Inspection of 
Table 2 shows that the differences were not 
always in the expected direction. Hence, 
there is no support for the hypothesis. 


Other Tests of Significance 


The significance of Source of Variation No. 
5 shows that the later samples are preferred 
more than the earlier ones. However, this 
and other significant sources of variation are 
of only incidental interest here and will not 
be further discussed. 


Discussion 


Both predictions concerning changes in 
preference for the “poors” were substantiated, 
and both predictions involving changes in 
preference for the “goods” were not. In fact, 
the ratings of the “goods” remained almost 
invariant regardless of the nature or number 
of samples preceding them. It is possible that 
the “goods” were so good that the negative 
characteristics, present to a marked degree in 
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the “poors,” were completely absent or below 
threshold. Indeed, on the first replication, 
Os were unable to specify anything negative 
about the “good.” Absence of negative char- 
acteristics makes Assumptions 3(a), 5(a), 
6(6), and part of Assumption 2 inapplicable. 

Obviously, definitive results could be ob- 
tained only with the accompaniment of an 
independent assessment of the presence and 
absence of both positive and negative charac- 
teristics. However, present methods for de- 
termining these characteristics are not satis- 
factory, primarily because judges are unable 
reliably and independently to describe their 
introspective experiences. Currently, research 
is being considered on pychometric multivari- 
ate methods for inferring these characteristics 
without resort to verbalizations by the judges.* 

Because the positive and negative charac- 
teristics were not independently established, 
this experiment cannot be considered to be a 
crucial test of the validity of the assumptions. 
Apart from the consideration that develop- 
ment of methods for assessing these charac- 
teristics will enable a rigorous test of the as- 
sumptions, tentative retention of the assump- 
tions set forth here should prove useful. 

First, it is advocated that the following ad- 
ditional assumption be added: Presentation of 
a “poor” increases an individual’s awareness 
of the presence of positive characteristics in 
a “good,” i.e., the individual doesn’t appreci- 
ate the excellence of a “good” until the ab- 
sence of the positive characteristics in the 
“poor” makes him cognizant of their presence 
in a “good.” 

When there is no independent determina- 
tion of the individual’s perception of the pres- 
ence of positive and negative characteristics, 
inclusion of this assumption would preclude 
derivation of certain predictions, such as the 
ones regarding convergence effects in the pres- 
ent experiment. On the other hand, even 
when there is no independent determination, 
predictions of more complex phenomena can 
be made. 

Consider, for example, four types of an or- 
ange soda pop: carbonated, slight off-flavor; 

3 Impetus for this line of psychometric research 
was given by results of certain preference tests 
wherein departures from the postulates of order 


(transitivity and asymmetry) were evident among 
different samples of the same product. 


carbonated, marked off-flavor; noncarbon- 
ated, slight off-flavor; noncarbonated, marked 
off-flavor. If it is independently demon- 
strated that most people prefer the carbon- 
ated beverage over the noncarbonated one 
and that the slight off-flavor beverage is pre- 
ferred to the marked off-flavor one, it can be 
shown that at least seven predictions are de- 
ducible. For example, it would be predicted 
that a carbonated, slight off-flavor sample 
will tend further to depress the ratings of a 
carbonated, marked off-flavor sample that fol- 
lows it; and a noncarbonated, marked off- 
flavor sample should have the opposite effect. 
Testing of these predictions is contemplated. 

Another reason for at least tentatively re- 
taining the previous assumptions is that they 
focus attention and may provide answers to 
problems facing manufacturers of consumer 
goods. Consider the case, for example of a 
manufacturer of a hi-fi component which is 
well liked by its users. Suppose, also, that he 
is considering adding to his line an improved 
component, which, because of its extra cost, is 
not expected to be bought by as many people. 
The question arises: Will the new component 
cause a decreased preference level for the 
older one with a consequent reduction in sales 
for it, or will the effect be mainly one of in- 
creased liking for the new and no change in 
liking for the old? 

Similarly, when those who have had pleas- 
ant experiences with such optional automobile 
equipment as automatic transmissions and 
power brakes are in the market for a new car, 
their preference level for autos without these 
accessories may decline, while their preference 
levels for autos with these remain constant. 
If level of preference is related to willingness 
to buy and if such a decline in preference oc- 
curs, then those who are able to afford just 
the basic auto might rather forego its pur- 
chase or buy a used one with these extras. 

Thus, in many cases where contrast effects 
appear, it is important to determine whether 
preference for the “good” rises or whether 
preference for the “poor” declines. To the 
extent the model is able to predict which— 
the “good” or the “poor’—is responsible for 
the increased or decreased differences in pref- 
erence, its bearing on marketing problems in- 
creases. 





Joe Kamenetzky 


Summary 


A set of assumptions was made that led to 
the hypothesis that preference ratings for poor 
quality food will be lower when preceded by 
a good quality food than when preceded by 
another poor quality item (contrast effects). 
It was also hypothesized that preference for 
a good quality food will be higher when pre- 
ceded by another good quality item than 
when preceded by a poor quality product 
(convergence effects). The other predictions 


were that preference will increase with suc- 
cessive presentations of the same quality item, 
provided no opposite quality intervenes. The 


predictions concerning preference for the poor 
quality foods were clearly confirmed, but 
those involving the good quality foods were 
not substantiated. Experimental and prac- 
tical implications of the assumptions and re- 
sults are discussed. 


Received May 1, 1958. 
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DISCUSSION AS A FUNCTION OF ATTITUDES 
AND CONTENT OF A PERSUASIVE 
COMMUNICATION * 
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Several recent investigations have demon- 
strated the importance of considering the 
initial divergency between the opinions of the 
recipients of a communication and the posi- 
tion advocated by the communication. Hov- 
land, Harvey, and Sherif (1957) report that 
those favoring the stand taken by a com- 
munication judged it to be more fair and 
factual as well as closer to their own posi- 
tion than did those disagreeing with the com- 
munication content. Persons closer in their 
attitudes to the communication content also 
tended to change more in that direction 
than individuals further removed from the 
communicator’s position. Utilizing discus- 
sion group situations contrasted with nondis- 
cussion situations, Mitnick and McGinnies 
(1958) found that shifts in attitude as well 
as amount of information learned from a film 
were greater among the members of discus- 
sion groups than among passive audiences. 
For Ss who disagreed with the communica- 
tion content, however, this effect was signifi- 
cantly less in the discussion groups than in 
the nondiscussion groups, indicating that atti- 
tude change was being influenced differen- 
tially in discussion according to initial atti- 
tudes of the discussants. The amount of 
factual material retained after film viewing 
was positively related to agreement with the 
content of the film in all groups. 

In examining the rather extensive discus- 
sion protocols collected by Mitnick and Mc- 
Ginnies, we were impressed with certain ap- 
parent discrepancies between the various atti- 
tude groups with respect to their discussion 
behavior. Further analysis of these data with 

1 This research was supported in part by a special 
grant from the National Institute of Mental Health, 
National Institutes of Health, U. S. Public Health 
Service. Leonard Mitnick very kindly made avail- 
able the data upon which the present analysis is 
based. 

2 Now with Human Sciences Research, Inc., Ar- 
lington, Va. 


attention to several quantitative features of 
the discussions promised to shed light upon 
the interaction between attitudes and com- 
munication content as a factor determining 
the discussion process itself. This line of ap- 
proach focused upon the verbal behavior of 
the communication recipients rather than 
upon the ‘attitudinal consequences of the ex- 
perimental procedure. 

A number of quantitative procedures de- 
signed to explore discussion behavior have 
been reported. One of the earliest of these 
was proposed by Chapple (1940) who was 
concerned with the frequency and duration 
of verbal behavior. His method, described 


by Heyns and Lippitt (1954) as “an essen- 
tially contentless observational system,” re- 
lied upon measurements of certain temporal 


aspects of interpersonal communication which 
Chapple felt could be used as predictors of 
other features of the interaction process with- 
out depending upon inferences about the mo- 
tives of the participants. This type of pro- 
cedure contrasts with that developed by 
Bales (1950), as well as with those described 
by Bradford and French (1948). Several in- 
vestigators, however, have followed Chapple 
in emphasizing a formal or a quantitative 
mode of analysis of verbal behavior in discus- 
sion groups. Stephan and Mishler (1952) 
found that a simple exponential function ade- 
quately describes the distribution of partici- 
pation among the members of small groups. 
A study by Findley (1948) focused upon dis- 
cussion participation, and the author presents 
a statistical index which is maximized when 
participation is equally distributed and mini- 
mized when two individuals monopolize the 
discussion. Dickens (1953) has also proposed 
a measure of the spread of participation in 
group discussion. 

In the present paper, several statistical in- 
dices of group discussion behavior are de- 
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scribed which should prove applicable in 
many types of discussion situations and 
which are descriptive of group reactions to 
a controversial communication. Although 
measures of this general type have been used 
previously, this essentially statistical ap- 
proach to discussion behavior has not been 
related to the known attitudinal character- 
istics of target audiences. As will be indi- 
cated, attitudinally homogeneous groups differ 
in several specific respects when they discuss 
a communication that is related to their ini- 
tial views. 


The Experiment 


Measures employed. Five statistical in- 
dices of group discussion behavior were se- 
lected on the basis of prior research as being 
descriptive of important aspects of the dis- 
cussion process. The measures were: (a) 
verbal output, (4) participations, (c) rate of 
response, (d) spontaneity, and (e) recruit- 
ment. The operations determining each of 
these measures will be described in reporting 
the results. 


Method. The data upon which the several appli- 
cations have been based were collected by Mitnick 
and McGinnies (1958) for the purpose of determin- 
ing the effectiveness of a sound film in modifying 
ethnocentric attitudes. Briefly, their study investi- 
gated the effects of a film, The High Wall, upon the 
responses of high school students to an adaptation 
of the California ethnocentrism (E) scale. Some of 
the groups discussed the film, while others merely 
viewed it. We shall be concerned here only with 
the discussion groups. On the basis of pretesting 
with the E scale, three types of discussion groups 
were formed. These included (a) Ss with high 
scores, indicating marked ethnocentric attitudes, (b) 
Ss with low scores, indicating relative lack of per- 
sonal ethnocentrism, and (c) Ss with middle scores, 
indicating either indifference or indecision with re- 
spect to ethnocentric convictions. Since the film in 
question was produced for the purpose of counter- 
acting prejudice against minority groups, it was as- 
sumed that the high E Ss were antagonistic to the 
communication, the low E Ss were favorable to the 
communication, and the middle E Ss were neutral 
with respect to the communication. The groups 
were balanced for size, each consisting of nine mem- 
bers, and no significant differences appeared in their 
relative socioeconomic status or general intelligence 
scores. All were formed randomly, so that no es- 
tablished sociometric patterns were involved. The 
same discussion leader served for all of the groups 
and assumed a permissive or nondirective role.® 


3 We are grateful to Willard Vaughan, who acted 
as discussion leader for the groups. 
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Two groups were formed for each type of predis- 
position, so that a total of six discussion groups, 
containing 54 members, furnished the data upon 
which the present findings are based. The discus- 
sions were limited to a half-hour, and all were tape- 
recorded for later analysis. The findings relating 
to attitude change have been reported elsewhere 
(Mitnick & McGinnies, 1958). We will limit our- 
selves in this paper to descriptions of the discussions 
in the three predisposition groups in terms of the 
measures developed. 


Results 


Verbal output. This measure is obtained 
by simply counting the number of words 
emitted by all members of the discussion 
groups. The data from the two groups rep- 
resenting each degree of ethnocentrism were 
combined in order to give a more reliable in- 
dication of behavior within the attitudinal 
categories. Inspection of the mean word 
counts reveals significant differences in verbal 
output among the six predisposition groups. 
The two most active groups in terms of sheer 
verbal productivity were composed of those 
Ss who were assumed on the basis of their 
low E scores to favor the communication. 
Second to these individuals in verbal output 
were the Ss with high E scores, presumed to 
be antagonistic to the communication. Least 
productive in terms of verbal response were 
the Ss selected from the middle of the E-score 
distribution. Frequency distributions of ver- 
bal output in each of the three attitude groups 
are presented in Fig. 1. A distinctly bimodal 
distribution is present in the case of the low 
Es, with the peaks occurring in the class in- 
tervals of 1-200 and 800-1000 words. Only 
two of the 18 individuals in these two discus- 
sion groups had nothing to say. Six of the 
18 high Es declined to participate in the dis- 
cussion of the film, and relatively few are 
found in the higher output categories. Fi- 
nally, the middle Es were observed to be ex- 
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tremely reticent in the discussion situation, 
nine of them declining to speak at all. 

The evident J shape in the case of the 
middle E groups may be intepreted as indi- 
cating a constriction in the range of verbal 
output in moving from greater to lesser atti- 
tudinal involvement with the communication. 
The mean numbers of words emitted by indi- 
viduals in the three predisposition conditions 
were: low E — 45), high E — 296, and mid- 
dle E — 175. Since the respective distribu- 
tions are. badly skewed, the significance of the 
differences among these means was evaluated 
by a nonparametric analysis of variance sug- 
gested by Walker and Lev (1953). The 
probability that the means were drawn from 
the same population is less than .05. We may 
conclude with the reasonable certainty, there- 
fore, that the three degrees of ethnocentrism 
represented by the experimental groups pro- 
duced significant differences in verbal out- 
put among these groups during a discussion. 
These differences are reflected not only in the 
means but also in the relative skewness of the 
respective distributions. 

Number of participations. A more readily 
obtained measure of discussion activity, since 
it does not depend upon transcriptions of the 
recorded discussions, is the number of dis- 
crete entries of an individual into the discus- 
sion. This measure is taken without regard 
to the length of a given participation. As an- 
ticipated, the distribution of participations in 
discussion closely resembled that for verbal 
output. Figure 2 summarizes these data. In 
the groups favorably disposed toward the 
communication, the number of entries into 
discussion shows fairly wide dispersion. The 
mean number of participations by members 
of the low E groups was 21.7, with some in- 
dividuals speaking as many as 70 times. 
Among the high Es, the mean number of en- 
tries into discussion was 17.0. Those in the 
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Fic. 2. Distributions of participations under the 


three attitude conditions. 


middle groups who were disposed to speak 
participated on the average of 7.6 times. 
Nonparametric analysis of variance showed 
the means of the three groups to differ at the 
.O5 level of significance. 

As in the case of verbal output, constric- 
tion of the range of participations is seen in 
moving from strong to weak attitudes. The 
rank-order correlation for the sample as a 
whole between verbal output and number of 
participations was .95. In view of the high 
correlation between these two measures, it 
would seem advisable in many situations to 
use participations alone as a measure of dis- 
cussion activity, since this index can be re- 
corded during the discussion without depend- 
ence upon an exact protocol. 

Rate of response. The number of responses 
emitted by an § during an interval of time is 
a widely used measure of performance in a 
variety of situations. Insofar as attitude pre- 
disposes the individual to a characteristic pat- 
tern of relevant behavior, the response rate 
of groups composed of individuals with known 
attitudinal biases should be reflected in this 
measure. Rate of response in the six discus- 
sion groups was determined by summing the 
words emitted by both the leader and the 
group members during each of six time pe- 
riods and dividing by the length of the pe- 
riods in minutes. Since the discussions varied 
in length from 21 to 31 minutes, depending 
upon the willingness of the participants to 
continue, the measures describing temporal 
features of the discussions were plotted as a 
function of equal portions of the meetings 
rather than as functions of elapsed time. This 
procedure, which is analogous to plotting Vin- 
cent learning curves, made it possible to plot 
the behavior of all of the groups upon the 
same time base and to make direct compari- 
sons among them. The differences in dura- 
tion of the meetings are reflected in the meas- 
ures of verbal output which, however, are not 
simply artifacts of these differences since all 
of the groups were permitted to use a half- 
hour period if they were so inclined. 

Inspection of Fig. 3 shows that response 
rates under the three conditions differ not 
only in level but in trend. The over-all lev- 
els of verbal activity throughout the course 
of the discussions are predictable from the 
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Fic. 3. Rate of response during discussion under 
the three attitude conditions. The equations for the 
regression lines are as follows: Low Es, Y = 5.08X + 
152.17; High Es, Y = 2.81 + 136.21; Middle Es, Y 
= — 3.46X + 119.93. 


verbal output measures. However, the trends 
reveal interesting differences among the three 
degrees of ethnocentrism represented in the 
discussion groups. Regression lines were 
fitted to the data, and the respective 5 co- 
efficients were tested for significance. Both 
the low E and the middle E groups showed 
a significant regression of response rate upon 
time. The slope of the function for the high 
E groups did not differ significantly from 
zero, due to greater variability in response 
rate at different periods in the discussion. 
The low E groups increased their rate of re- 
sponse throughout the discussion, while the 
middle E groups decreased in rate of verbal 
activity during the meetings. 

The results indicate that rate of verbal re- 
sponse in a discussion situation may be pre- 
dicted from knowledge of the attitudinal dis- 
positions of the discussants. Those groups 
composed of individuals with a favorable atti- 
tude toward the communication under discus- 
sion display not only a high initial rate of re- 
sponse but also a gradual increase in verbal 
activity as the discussion progresses. Indi- 
viduals who are antagonistic toward the com- 
municaticn show a lower over-all response 
rate, greater variability in their rate of verbal 
activity, and no significant change in rate 
throughout the discussion. Discussants who 
are neutral or undecided with respect to the 
point of view presented for discussion display 
a generally low rate of response which de- 
clines toward the end of the discussion period. 

Spontaneity. In discussion situations char- 
acterized by permissive or nondirective lead- 


ership, opportunity is afforded for the group 
members to display varying degrees of initia- 
tive in maintaining the flow of comments. 
This feature of discussion behavior may ap- 
propriately be referred to as spontaneity, or 
the extent to which the discussion is carried 
by the group members without prompting 
from the discussion leader. In the present 
study the discussion leader avoided entering 
the discussion except on those occasions when 
activity lagged seriously or when he was 
posed a direct question. A remark by a 
group member which occurred without im- 
mediate prior comment from the leader was 
coded as spontaneous. A series of partici- 
pations initiated by the group members with- 
out intervening leader participation, there- 
fore, were scored as spontaneous, whereas 
comments elicited by the leader were scored 
as nonspontaneous. 

In order to adjust for differences in total 
verbal output of the groups a spontaneity 
index for each time period was computed as 
follows. The number of comments judged as 
spontaneous during each time interval was 
divided by the total number of participations 
for that period. This ratio was then multi- 
plied by the percentage of that particular 
group’s contribution to the verbal output of 
the six experimental groups combined. This 
procedure effectively weighted the spontaneity 
ratios according to the over-all verbal pro- 
ductivity of the attitudinal groups concerned. 
Points representing the level of discussion 
spontaneity in the three predisposition condi- 
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Fic. 4. Trends in discussion spontaneity under the 
three attitude conditions. The equations for the re- 
gression lines are as follows: Low Es, Y = 1.83X + 
29.54; High Es, Y = 2.36X + 15.24; Middle Es, Y 
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tions are plotted by time periods in Fig. 4. 
Regression lines have been drawn through the 
three sets of data to show more clearly the 
group trends. Several features of the dis- 
cussions are revealed. First, it is clear that 
well-defined biases toward the communication, 
whether positive or negative, are associated 
with higher levels of discussion spontaneity 
than are indeterminant attitudes. The low 
Es begin the discussion with relatively little 
prompting from the leader and increase gradu- 
ally in spontaneity until the end of the dis- 
cussion period. This is the only significant 
trend among the three experimental condi- 
tions. Although commencing at a low level of 
spontaneity, the high Es appear to become 
progressively more spontaneous as the meet- 
ing continues, although the regression effect 
fails to reach an acceptable level of confi- 
dence (P < .10 > .05). It should be inter- 
esting to determine whether this apparent 
recovery from an initially low degree of spon- 
taneity in the high E groups is in part due 
to the type of nondirective leadership pro- 
vided. Having discovered that their antago- 
nisms toward the communication will be ac- 
cepted, perhaps the high Es generate a greater 
degree of spontaneity than they would exhibit 
were the leader to reinforce the content of 
the communication. 

Members of the two neutral groups were 
barely able to produce any spontaneous re- 
marks early in the discussion, and_ they 
showed but little improvement in this respect 
as the sessions continued. Their central po- 
sition in the distribution of ethnocentrism 
scores does not, of course, indicate the nature 
of their ethnocentric bias as clearly as the 
scores for the two extreme groups. Depend- 
ence upon the discussion leader in this situa- 
tion may reflect uncertainty about what to 
say in response to a communication that takes 
a positive position on matters about which 
they are undecided or toward which they are 
apathetic. 

Recruitment. Since the discussion group 
members were identifiable in terms of their 
row and seat designations it was possible to 
determine from examination of the transcripts 
the exact time at which any individual entered 
the discussion. A full description of the 
method employed is reported by one of us 


elsewhere (McGinnies, 1956). We have de- 
fined the cumulative rate at which new indi- 
viduals take part in the discussion as recruit- 
ment. In Fig. 5 we have plotted the re- 
cruitment functions of the discussion groups 
under the several conditions of ethnocentrism. 
Consistent with the other temporal measures, 
the low E groups are seen to have the most 
rapid rate of recruitment as well as the high- 
est final level of group participation. The 
high Es are recruited less rapidly into the 
discussion and reach a peak level somewhat 
later in time. Starting at an even lower 
point, the middle Es are recruited at about 
the same rate as the high Es, but they level 
off relatively early. The terminal level of 
group participation achieved in the three 
situations are: low Es—89%; high Es-67%; 
middle Es-50%. Comparison of these per- 
centages by a ¢ test for independent propor- 
tions shows the low Es reaching a significantly 
higher final participation level than the mid- 
dle Es (P < .02). The other differences were 
not significant. 

These relationships suggest that the rate 
of recruitment of small group members into 
a discussion is predictable from knowledge of 
their attitudes toward a communication. 
Agreement with the communication content 
appears to facilitate early entrance | | oten- 
tial participants into the discussion, with few 
additional discussants being recruited after 
the first half of the discussion period. Even 
a negative attitude toward the communication 
produces greater over-all participation than 
an indeterminate bias, although the rate of 
recruitment is about the same in both cases 
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Fic. 5. Rates of recruitment of discussion partici- 
pants under the three attitude conditions. The 
curves are fitted by inspection, 
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and, again, reaches a maximum approximately 
half-way through the discussion.* 


Discussion 


It has been demonstrated that five statisti- 
cal measures differentiate consistently among 
discussion groups with different attitudinal 
sets. These measures have the advantages of 
being both objective and reliable. Do they 
contribute anything to our understanding of 
discussion behavior that could not be ob- 
tained through more qualitative techniques of 
analysis? 

Bales (1950) has made a useful and sig- 
nificant distinction between the topical and 
the process content of discussion material. 
This classification distinguishes between what 
is said in a discussion and how it is said. A 
system of “interaction process analysis” de- 
vised by Bales provides categories for the 
coding of verbal units according to their dy- 
namic significance in the discussion. Remarks 
are considered to indicate problems of orien- 
tation, evalyation, control, decision, tension- 
management, and integration. In an attempt 


to determine whether predispositions of the 


discussants would influence the discussion 
process as conventionally measured, we ap- 
plied Bales’ interaction process analysis to 
one discussion from each of the three attitude 
groups. The results were similar to those 
that we have obtained with this method when 
used with the discussions of groups viewing 
other mental health films. Namely, there 
was an accumulation of comments in Cate- 
gories 5 and 6 of Bales’ system, which are 
scored as “giving opinion, evaluation, orienta- 
tion, and information,’ so that the profiles 
of the several groups were essentially similar. 
That the different groups were reacting sig- 
nificantly to the film content, however, is 
clearly revealed in the “contentless’” indices 
described. Those groups with strong, al- 

# Although based upon a relatively small N in the 
present instance, the recruitment functions shown in 
Fig. 5 are consistent with those obtained in prior re- 
search with a large number of groups. Recruitment 
of discussion participants in small groups dealing 
with a congenial topic levels off at about 80% 
shortly after the first half of a 30-minute discussion 
period. In large groups, numbering up to 90, re- 
cruitment proceeds in linear fashion throughout the 


meeting but does not exceed 30% in a half-hour 
period. 


and Irwin Altman 


though opposite, attitudes differ clearly in 
their behavior from neutral groups. 

One other possible interpretation of the 
results must be entertained. This lies in the 
fact that the California E scale correlates 
highly with authoritarianism. It is conceiv- 
able, therefore, that the differences in the 
discussion measures among the experimental 
groups might be attributable to personality 
differences rather than to ethnocentric atti- 
tude. Bass (Bass, McGehee, Hawkins, Young, 
& Gebel, 1953), for example, has reported 
that girls who score low on the F scale have 
higher leaderless-group-discussion scores than 
those who score high. On the other hand, 
Rokeach (1948) reports that verbalization 
scores during problem solving are much higher 
for high Es than for low Es, using problems 
unrelated to ethnocentrism. Rokeach (1956) 
also showed that both highs and lows scored 
high on a scale of dogmatism, suggesting that 
they might react in similar fashion to some 
situations. Our results bear out this hy- 
pothesis but, unlike Rokeach’s earlier find- 
ings, they indicate the low Es to be more 
verbal. In a study of the relationship be- 
tween personality predisposition and behavior 
in groups, Haythorn (Haythorn, Couch, Haef- 
ner, Langham, & Carter, 1956) found that 
four-man discussion groups composed of au- 
thoritarians differed in some respects from 
groups of equalitarians. Of 16 postmeeting 
observer ratings, however, only one discrimi- 
nated significantly between the high F and 
low F groups. There is little evidence, then, 
that authoritarianism, as such, is a signifi- 
cant variable determining those aspects of 
group discussion behavior that we have re- 
ported. 

The essential conclusion from our data 
would seem to be that well-defined attitudes 
toward a persuasive communication, regard- 
less of direction, are associated with a gener- 
ally higher level of discussion activity. A 
distinctly favorable attitude is reflected in 
greater verbal output, a progressively increas- 
ing rate of response, a high and accelerated 
degree of spontaneity, and rapid recruitment 
of participants. Antagonistic discussants rank 
second in these respects, while neutral recipi- 
ents of the communication reveal their apa- 
thy (or indecision) in all of the measures. 
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The data supplement the usual measurements 
of attitude change following a persuasive com- 
munication by reflecting social behavior rather 
than responses to a questionnaire. They indi- 
cate the feasibility of predicting group reac- 
tion to a communication when the initial 
attitudes of the members are known. Finally, 
they provide additional operational meaning 
for the concept of “attitude” as a predisposi- 
tion to respond in consistent fashion to rele- 
vant stimulation at the level of group analysis. 


Summary 

Five relatively precise ‘“contentless’’ meas- 
ures of the group discussion situation were 
applied to the discussion protocols of six small 
groups of high school students, formed ac- 
cording to degree of personal ethnocentrism. 
The groups, designated as low, medium, or 
high in ethnocentric predisposition, discussed 
a film that attempted to explain and liberalize 
prejudice toward minority groups. Consistent 
differences smong the three degrees of ethno- 
centrism represented in the discussion groups 
were reflected in the five indices. Those Ss 
favorably disposed toward the communication 
content showed a greater degree of discussion 
activity and spontaneity than did Ss who were 
antagonistic or neutral toward the communi- 
cation. These statistical measures served to 
differentiate the groups where other, more 
subjective, analyses failed. 


Received May 5, 1958. 
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When ratings are used as criterion meas- 
ures, more ultimate criteria of performance 
are generally not available for validating 
them; indeed if more ultimate measures were 
available, ratings probably would not be em- 
ployed in the first place. It is necessary, 
therefore, either simply to accept the ratings 
as valid or to seek indirect indications of 
their validity. Two such indications are often 
employed. The first is the reliability of the 
ratings as shown by the amount of agreement 
among scores assigned the same ratees by 
different raters. The second is the predict- 
ability of the ratings, or the extent to which 
they correlate with measures to which they 
should be related, according to either the re- 
sults of logical analyses or previous research. 

The purpose of this study was to investi- 
gate the hypothesis that high agreement 
among the ratings assigned the same men by 
different raters does not necessarily imply 
predictable or valid ratings and that disagree- 
ment among raters may be associated with 
predictability and possibly validity. 

The hypothesis is based first on the as- 
sumption that ratee behavior in most perform- 
ance rating situations is not entirely consistent 
from one time to the next with respect to 
particular traits, primarily because no effort 
is made to control the physical and psycho- 
logical environment during the period the 
ratings are designed to cover. To be valid, 
then, ratings must reflect these inconsistencies. 

Even if ratees behaved entirely consistently 


1 This study was performed as part of a criterion 
research project supported by the Personnel and 
Training Branch, Psychological Sciences Division, 
Office of Naval Research, under Contract Nonr 
1241(00). Reproduction in whole or in part is per- 
mitted for any purpose of the United States Gov- 
ernment. 

2 Robert R. Mackie, Director of Research, and 
Albert Harabedian of Human Factors Research made 
valuable contributions to the conduct of the study 
and the preparation of this report. 


with respect to particular traits regardless of 
the situation, ratings of them would not neces- 
sarily be in agreement since raters use differ- 
ent criteria in rating on the same trait (Guil- 
ford, 1954, p. 295). The second assumption, 
then, is that these criteria employed by differ- 
ent raters are all valid and the differences 
in ratings reflected by them are also valid. 
This second assumption implies that part of 
achieving the ultimate in performance is satis- 
fying the demands of various superiors by 
behaving in different ways. 

On the basis of these assumptions, high 
agreement among ratings could imply a poor 


sampling of observations of ratee behavior by 


raters, a poor sampling of raters in terms of 
the criteria they use to evaluate particular 
traits, or both. Disagreement among the 
ratings assigned to the same men by different 
raters, on the other hand, might indicate that 
a more representative sample of observations 
and rater criteria was obtained. 

It is obvious on the basis of these two 
assumptions that high interrater agreement 
could also indicate validity. If a ratee knew 
the criteria his superiors were going to employ 
in rating him, for example, he could behave, 
assuming he had sufficient control of his be- 
havior regardless of the environmental situa- 
tion, so as to satisfy them. It is assumed, 
however, that the majority of men are not 
entirely aware of the nature of their supe- 
riors’ criteria, and even if they are, they 
neither have adequate control over their be- 
havior in all situations nor are obsequious 
enough to attempt to satisfy all of their 
superiors’ demands. 

To reiterate, the hypothesis tested in the 
present study was that high agreement among 
the ratings assigned the same men by differ- 
ent raters does not necessarily imply predict- 
able or valid ratings. 
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Method 


The samples. A total of 171 men aboard 21 dif- 
ferent submarines of the Pacific Fleet were rated in 
groups of four to nine by three of their superiors, 
either by two officers and one chief petty officer 
(CPO) or one officer and two CPO’s. Those men 
who had been aboard their assigned boats for a 
period of at least 10 months were selected from this 
total sample for the investigation reported here. 
There were 97 such men so an additional three men 
were randomly selected from the group that had 
been aboard for nine months and added to the 
sample to make it an even 100 ratees. 

The rating scale. A general trait scale containing 
10 technical competence (TC) traits and 10 personal 
adjustment (PA) traits was used. The ratings were 
assigned on a scale of 25 hypothetical submariners, 
small sailor-like figures extending from the bottom 
left to the top right of each page of the rating book- 
let. Verbal descriptions were added to the extreme 
figures as well as the middle figure which was de- 
scribed as “the ordinary submariner of his rate.” 
Each rater assigned his ratings independently on one 
trait at a time and rated only men of the same 
job classification and rank at one time. 

Only the means of the 10 TC and the 10 PA 
trait ratings assigned by each rater were used in 
this study. Since there were three raters, there were 
three such mean scores for each ratee on each of the 
two classes of traits. The means of these, ie., the 
ratee’s total mean rating on the PA and on the TC 
traits, were used in the correlational analyses designed 
to test the hypothesis and in computing the total 
variance term for the interrater agreement estimates. 

Estimates of interrater agreement. An agreement 
score was computed for each ratee; it was the sum 
of the squared deviations of the three rater means 
about the ratee’s total mean. The distribution of 
these scores was divided at the 25th, 50th, and 75th 
centiles to yield four groups of 25 ratees each: the 
high agreement (HA), moderate agreement (MA), 
moderate disagreement (MD), and high disagree- 
ment (HD) samples. Interrater agreement estimates 
were made for the four samples by using the sum 
of the agreement scores as the error variance 
term and the variance of the total mean ratings as 
the total variance in the basic equation for the 
coefficient of reliability (Guilford, 1950). 

This procedure of computing agreement scores for 
each ratee and dividing the sample so as to achieve 
four levels of interrater agreement was carried out 
separately for the TC and the PA ratings. The cor- 
relation between the two classes of ratings was high 
so there was overlap between the samples; never- 
theless many of the ratees in the HA-TC sample, 
for example, were not in the HA-PA sample. 

The criteria of predictability. Three measures were 
used to compare the predictability of the ratings: 
the Navy General Classification Test (GCT), the 
Navy Mechanical Aptitude Test (MECH), and the 
Submarine School Class Standing (SSCS). Previous 
research had shown that these variables were sig- 


nificantly related to performance aboard submarines 
as measured by ratings, check lists, and job sample 
performance tests (Mackie, Wilson, Buckner). SSCS, 
which is based on a composite of written achieve- 
ment test scores and instructor ratings and has an 
estimated reliability of .90, was found to correlate 
higher with scores on the shipboard criteria than 
any of a variety of predictor variables studied. It 
was selected from the measures available for this 
study, therefore, as the variable most likely to be 
related to the ultimate criterion and as probably the 
best indicator of the validity as well as the pre- 
dictability of the ratings. 

Comparability of the experimental samples. F and 
t tests were performed to determine whether or not 
the experimental samples could be assumed to have 
been obtained from the same population with re- 
spect to the variables relevant to the study. None 
of the differences between means or variances of 
scores on the three predictor variables was signifi- 
cantly different from zero. The differences between 
the means and variances of months on board were 
not significantly different from zero. The differences 
between the means and variances of the ratings as 
signed the men in the experimental groups were not 
significantly different from zero; the mean of all 
ratings was 14.7 and the standard deviation was 4.3 
The experimental samples were assumed on the basis 
of the statistical tests to have been obtained from 
the same population with respect to these variables 

Correlational analyses. Scores on the three pre- 
dictor variables, SSCS, GCT, and MECH, were 
correlated with the ratees’ total mean ratings. Sepa- 
rate analyses were performed for each of the experi- 
mental groups and for both the TC and the PA 
ratings. The score on SSCS actually used in the 
computations was the proportion of men in his sub- 
marine school class each ratee exceeded. Pearson 
product-moment coefficients were computed. The 
correlations were computed using both raw and 
standard scores. The results were essentially the 
same. The raw score results are reported here 


Results 


The interrater agreement estimates and the 
results of the correlation analyses are shown 
in Table 1. 

None of the correlations between scores on 
the predictors and either the PA or TC rat- 
ings was significantly different from zero in 
the agreement samples, HA and MA. All 
three of the correlations with the TC ratings 
in the MD sample were significant, the cor- 
relation between SSCS and the TC ratings 
being significant at the .01 level and the other 
two at the .05 level. SSCS was also signifi- 
cantly correlated (.05 level) with the TC 
ratings in the HD sample for which the inter- 
rater agreement estimate was .00. 
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Table 1 


Interrater Agreement Estimates and the Correlations Between Scores on the Predictor 
Variables and the Total Mean Ratings 


(N = 25 ratees in each of the eight samples) 


SSCS 
Sample zc j TC 
HA se 
MA .84** 


MD .69** 
HD 00 


* Significant at .05 level 
** Significant at .01 level 


Only two of the correlations computed be- 
tween the predictor scores and the PA ratings 
were significant and both were with the rat- 
ings for which the interrater agreement esti- 
mate was lowest, .12 in the HD sample. The 
correlation between SSCS and the high dis- 
agreement PA ratings was significant at the 
.O1 level and the correlation between scores 
on the GCT and those ratings was significant 
at the .05 level. 

Additional analyses were performed in an 
effort to locate the source of the predictable 
variance in the ratings for which the inter- 
rater agreement estimates were low, i.e., the 
MD and HD groups. Only the means of 
the ratings on the 10 technical competence 
traits were used in these analyses. 

First it was hypothesized that the more 
extreme rating with respect to the mean of 
the three assigned a ratee was contributing 
more predictable variance than the other two. 
The procedure employed in testing the hy- 
pothesis was as follows: the ratings assigned 


Table 2 


Correlations Between Scores on the Predictor 
Variables and the One Disagree and the 
Mean. of the Two Agree Ratings 


(N = 25 ratees, TC ratings only) 


Ratings SSCS GCT MECH 


One disagree .50* 


ao .50* 


Two agree 35 30 .41* 


* Significant at .05 level 
** Significant at .01 level. 
° 


GCT MECH 


7 ! fi & PA 
— .23 ; —.07 .16 
02 01 
a 
18 .06 


the 50 ratees in the combined MD and HD 
samples were plotted on a large chart. In- 
spection showed that all three raters dis- 
agreed in their evaluations of some men. In 
the case of others, two of the raters were in 
substantial agreement and only the third dis- 
agreed. The 25 ratees (half of the combined 
HD and MD samples) for whom this latter 
pattern was most pronounced were selected 
for study. Scores on the predictor variables 
were then correlated both with the extreme 
rating assigned each of these 25 men and 
with the mean of the other two ratings given 
them. The results are shown in Table 2. 

The differences between the means and 
variances of the two samples of ratings were 
not significantly different from zero. The 
analyses showed, however, that the mean of 
the ratings assigned by the two raters who 
were in closer agreement correlated less with 
the predictors than did the ratings assigned 
by the rater who disagreed. 

The same sort of analysis was performed 
using the entire combined MD and HD sam- 
ples. Again only the TC ratings were used. 
In this case, of course, the agreement between 
the “two agree’ raters was not as great, and 
in some cases the rating of the one “disagree” 
rater was not much farther removed from the 
mean of the three ratings than the rating 
given by one of the “two agree” raters. As 
shown in Table 3, the two sets of ratings 
were almost equally predictable from SSCS. 
However, scores on the GCT and MECH 
variables correlated significantly (.05 level) 
with the more deviant rating and not with 
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Table 3 


Correlations Between Scores on the Predictor 
Variables and the One Disagree and the 
Mean of the Two Agree Ratings 
(N = 50 ratees, combined HD and MD samples, 
TC ratings only) 

Ratings SSCS GCT MECH 
43° 30* .29* 
.44** 19 25 


One disagree 
Two agree 


* Significant at .05 level. 
** Significant at .01 level. 


the mean of the ratings assigned by the two 
raters who were in closer agreement in their 
evaluations. 


Discussion 


The results showed that ratings of ship- 
board performance for which interrater agree- 
ment estimates were high were less predictable 
from scores on two aptitude tests and school 
achievement than were ratings for which the 
interrater agreement estimates were moderate 
and low. They indicate that high interrater 
agreement does not necessarily imply predict- 
ability in performance ratings and that in 
some instances interrater agreement and pre- 
dictability would yield incompatible indica- 
tions of the validity of ratings. 

Whether or not the results can be inter- 
preted to mean that interrater agreement is 
not necessarily a good index of the validity 
of ratings depends on whether or not one is 
willing to assume that the predictor variables 
employed in the study are positively related 
to the ultimate criterion of the performance 
that was rated. It is interesting to note, how- 
ever that SSCS which had been shown to be 
the single variable most highly related to 
various other criteria of shipboard perform- 
ance (ratings, check lists, and practical per- 
formance and job knowledge tests) was also 
the variable that showed the most significant 
positive relationships with the ratings for 
which the interrater agreement estimates were 
low. 

Ghiselli and Brown (1948), in summarizing 
the bases of unreliability in ratings, state, 
“The indication is that raters disagree pri- 
marily because they observe the individuals 


to be rated in different situations and under 
different conditions, and because they use 
different criteria for judging the same trait 
or characteristic.” The two factors that they 
say contribute to a lack of agreement in 
ratings were assumed in the development of 
the hypothesis tested here to contribute to 
their validity, as long as they are accurately 
reflected in the ratings. They continue by 
saying, “It follows from this evidence that 
reliability of ratings can be considerably in- 
creased by having the raters observe the indi- 
viduals under similar conditions, and by pro- 
viding techniques for making the ratings that 
will increase the likelihood that the traits or 
characteristics being judged will be evaluated 
on the same bases.” 

It is probably true that having raters ob- 
serve ratees under similar conditions would 
increase interrater agreement; it might also 
serve, however, to decrease validity for the 
ultimate criterion by failing to take into ac- 
count the variations in behavior that occur 
as a result of the changing conditions in the 
real on-the-job situations and the possible 
interactions between ratee performance and 
environmental conditions. 

Different members of a work group might 
react or perform well in one situation and 
poorly in another. Certainly with the variety 
of situations individuals face from day to day 
regardless of their occupations, they could 
not be expected to react consistently in all 
of them. Submariners, for example, live in 
a potentially threatening environment faced 
with the possibility of a tremendous variety 
of situations. In the operational environ- 
ment, officers and CPOs cannot observe their 
men perform either under similar conditions 
or in all situations, not only because of the 
physical layout of the boat but also because 
they have their own jobs to perform. Obser- 
vations of ratee behavior are of necessity 
almost chance occurrences. To develop a 
method whereby the ratees could be observed 
under similar conditions, even if it were possi- 
ble, would probably imply the exclusion of 
critical situations in which a man’s behavior 
would have potentially the greatest signifi- 
cance as far as his contribution to the effec- 
tiveness of the boat is concerned. The oppor- 
tunity of observing behavior in the critical 
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situations may be limited and at least 
partly dependent on chance; nevertheless, in- 
creasing interrater agreement by having raters 
observe ratees under similar conditions might 
defeat the more important purpose of obtain- 
ing valid ratings. 

With standardized environmental condi- 
tions such as Ghiselli and Brown suggest and 
with rater training so as to reduce the number 
of independent criteria raters employ in rating 
on particular traits, higher interrater agree- 
ment would probably imply greater predict- 
ability. Differences between the ratings as- 
signed the same mer by different raters would 
probably reflect error variance. It is pro- 
posed, however, that such differences resulting 
from ratings being made in the entirely un- 
structured on-the-job environment may reflect 
real differences in ratee behavior and, thus, 
true variance. 

It is not being suggested that high inter- 
rater agreement always implies a lack of pre- 
dictability. Essentially none of the variance 
in the high agreement ratings and only a 
portion of the variance in the low agreement 
ratings was predictable from scores on the 
three predictor variables used in this study. 
It is conceivable that both of these sources 
of variation could be predicted from other 
types of measures. 


Summary 


The hypothesis tested was that high agree- 
ment among the ratings assigned the same 
men by different raters does not necessarily 
imply predictable ratings. 

Two groups of ratings, personal adjust- 
ment and technical competence trait ratings, 
made by three superior officers, officers and 
chief petty officers, of 100 submariners serv- 
ing aboard 21 different submarines were each 
divided into four samples so as to achieve 
four levels of interrater agreement: .94, .84, 


.69, and .00 for the technical competence 
ratings and .88, .90, .61, and .12 for the per- 
sonal adjustment ratings. Correlations were 
then computed within each sample between 
three predictor variables (Submarine School 
Class Standing and the Navy General Classi- 
fication and Mechanical Aptitude Tests) and 
the mean of the three ratings assigned to 
each ratee. 

The hypothesis was supported by the re- 
sults. None of the 12 correlations between 
the predictor variables and the ratings for 
which the interrater agreement estimates were 
high (.94, .84, .88, and .90) was significantly 
different from zero. Six of the 12 correlations 
computed for the low agreement ratings (.69, 
.0O, .61, and .12) were significantly different 
from zero, two at the .01 level and four at 
the .05 level. Three of the six significant 
correlations were with the ratings for which 
the interrater agreement estimates were not 
significantly different from zero, .00 and .12. 

It was concluded that high interrater agree- 
ment does not necessarily imply predictable 
ratings and may in some instances indicate a 
lack of predictability. 


Received May 6, 1958. 
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The work of Mackworth (1950) has stimu- 
lated investigations of “vigilance” behavior 
from several points of view. For example, 
the Maryland group has been interested in 
the phenomenon from the viewpoint of stress 
and fatigue (Andrews and Ross, 1955; Whit- 
tenburg, Ross, and Andrews, 1956). Other 
investigations have been concerned with the 
characteristics of the signal such as rate of 
target presentation (Deese and Ormond, 
1953), intensity and duration of the signal 
(Adams, 1956), and changes in sensitivity to 
the signal (Bakan, 1955). Recent approaches 
to the problem of vigilance may be found in 
the reports of the 1956 Psychology Section 
meeting of the British Association for the 
Advancement of Science (Mackworth, 1956; 
British Association, 1957). 

The performance functions presented: by 
Mackworth are based on averaged data. 


There is wide variation among individuals, 
and some do not exhibit decrement in per- 


formance. Individual variation is unrelated 
to visual acuity. Although the need for indi- 
vidual predictors has been stressed (British 
Association, 1957), no characteristic has been 
found which correlates highly with vigilance 
performance. 

Since the decline of performance has been 
explained in terms of a state of drowsiness 
induced by the monotony of the monitoring 
situation (Bakan, 1955), an effective dimen- 
sion for differentiating individual perform- 
ance might be “level of activation” as de- 
scribed by, Schlosberg (1954). This concept 
places all emotional behavior on a continuum 
ranging from sleep to extreme excitement. 
Skin conductance as an index of activation 
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T. G. Andrews for his aid with the analysis of the 
results of this experiment, and to T. A. Hussman 
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conductance meter. 
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3 Now at Psychological Service of Pittsburgh. 


level has been related to such variables as 
hand steadiness and reaction time (Schlos- 
berg, 1954). Two recent studies also find 
conductance related to the individual’s per- 
formance. Hussman and Hackman (1955) 
found significant inter- and intra-individual 
differences between GSR and flying perform- 
ance measures of pilots. Pilots who per- 
formed better generally exhibited a higher 
GSR. Parker and Hackman (1955) found 
that the GSR level of Naval pilots while 
viewing statements concerning flight safety 
procedures was related to flying skill. 

The present study used change in basal 
conductance as the indicator of activation 
level and was undertaken as a preliminary 
study to deal with the following questions: 
(a) Does the conductance change during a 
vigilance task exhibit a systematic trend? 
(6) If changes in conductance do exhibit a 
systematic pattern, will the pattern be unique 
for each individual or descriptive of all Ss? 
(c) Is the conductance trend related to ‘effi- 
ciency of performance? 


Procedure 


The apparatus used to record skin conductance 
is a microammeter calibrated to read conductance. 
(See Fig. 1.) The meter reads directly the current 
passing through the hand which, in this circuit, 
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equals conductance. This relationship holds because 
the applied voltage is adjusted to 1 volt, thus I 
equals 1/R. The reciprocal of resistance is conduct- 
ance. The 28.6 K resistor is used as the standard 
resistance of an S in order to obtain the desired 
calibrated current to apply. The S’s resistance is 
introduced into the circuit by inserting the plug 
containing the leads from the palm electrodes into 
the jack, which substitutes S’s resistance for the 
28.6 K resistor. The instrument has two scales de- 
pending on the position of the double pole double 
throw switch (D.P.D.T.). When the switch is in 
the X, position, the meter face scale reads directly 0 
to 50 microamperes. When the switch is on Xo, the 
1.8 K resistor is then in parallel with the meter move- 
ment, and the meter reads from O to 100 micro- 
amperes. Copper electrodes and electrode paste were 
used for the palm contacts. 

The Ss used were six men and three women stu- 
dents at the University of Maryland. Each S had 
20/20 visual acuity corrected or uncorrected. A 
modified form of the Mackworth “clock” apparatus 
was used (Whittenburg et al., 1956). The S was re- 
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quired to detect double jumps of a clock pointer 
which made discrete jumps every 2 sec., and to re- 
spond to such jumps by pressing a switch. 

The same number and temporal pattern of double 
jumps employed by Mackworth was used. Double 
jumps occurred at .75, 1.5, 3.0, 5.0, 7.0, 8.0, 13.0, 
14.0, 15.0, 17.0, 20.0, and 30.0 min. during each half 
hour period. The four half-hour periods followed 
each other without interruption. During the 2 hr. 
session there were 7200 pointer movements, of which 
48 were double jumps. The performance measures 
were errors of omission (failures to detect a double 
jump) and errors of commission (responses when 
double jumps did not occur). 

The 12 X 10 ft. testing area was enclosed by parti- 
tions, 8 ft. tall. S sat 7 ft. from the clock, which 
fitted into the forward partition at a height of 3.5 ft. 
Palm contacts were placed on the front and back of 
the left hand. The S rested his left forearm and 
hand on the wide left arm of the chain during the 
2 hr. session. The response switch was mounted on 
the right arm of the chair, positioned so it could be 
pressed by S while his arm was in resting position. 


Table 1 


Microampere Readings at 5 Min. Intervals for Nine Ss During Clock Test 


1 2 3 4 


— - 41 
24 36 
16 33 
24 : 41 
18 43 
18 42 
22 24 
21 26 
30 E 45 
27 f 31 
26 25 
46 
29 
38 J 27 
60 42 
60 x 36 
24 
54 24 
56 27 
74 38 
39 
35 
31 
36 
41 


34 
21 
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Fic. 2. 
the curve just inside the ordinate. 
indicate Cluster 2; 


half-hour period. 


E remained outside the testing area during the ses 
sion. 

Conductance readings were taken at 5 min. inter- 
vals, at the time of the double jumps and, in addi- 
tion, 5, 10, and 15 sec. after the double jumps. The 
readings at and immediately after the double jumps 
were included to determine the effect, if any, of the 
critical stimulus on conductance. 

A 5 min. practice preceded the testing session. 
Double jumps occurred at 0.5, 1, 2, 4, and 5 min., 
and were pointed out by the E to emphasize their 
relation to the single jumps. 


Results 


The percentages of double jumps omitted 
by all Ss were 5% in the first half-hour, 5% 
in the second, 14% in the third, and 20% in 
the last half-hour. The conductance readings 
at 5 min. intervals for each S are shown in 
Table 1. In order to index the consistency 
of trend of conductance changes during the 
testing session, autocorrelations were com- 


Conductance change for each S including cluster membership. 


Cluster 3. 


S is identified by number above 


The solid curve and points indicate Cluster 1; dashed lines and x’s 
dotted lines and circles indicate 
90, and 120 min. are total errors of omission and 


The numbers above each curve at 30, 60, 
commission made by the S during the preceding 


puted for the sequence of 25 observations on 
each S. There were 24 readings in the se- 
quence for Ss 1, 2, and 3 due to the lack of 
a reading at the start of the testing session. 
Successive pair in the sequence were corre- 
lated, i.e., 1 with 2, 2 with 3, etc. The ‘cor- 
relation coefficients for the 9 Ss were as fol- 
lows: (1) .96, (2) .94, (3) .98, (4) .17, (5) 
81, (6) .26, (7) — .O1, (8) .97, and (9) .90. 

The following procedure was used to ob- 
tain conductance indices more representative 
of a given time interval. Readings up to the 
first 20 min. of the testing session were dis- 
carded to eliminate changes due to initial ad- 
justment to the testing situation. The re- 
maining readings were grouped into triads 
and the average of each triad taken. The 
obtained value spans 10 min. In these suc- 
cessive averages, the last reading in the triad 
is included as the first reading in the next 
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group. The rank order correlation of the 
conductance readings of each S with every 
other S was then computed. A cluster analy- 
sis was performed on this correlation matrix, 
and three distinct clusters emerged: Cluster 
i: Se i, 2, 6, 8; Chuster 2: Ss 3, 5, 9; and 
Cluster 3: Ss 4, 7. Fig. 2 summarizes these 
results. 

Cluster 1 is characterized by a gradual in- 
crease in conductance during the testing ses- 
sion, and Cluster 2 by a consistent decline 
over the 100 min. interval. The direction of 
these curves is seen to be associated with 
conductance level at the 20 min. point. The 
conductance of all Ss in Cluster 1 at 20 min. 
was lower than the conductance of the Ss in 
Cluster 2. It is noted that the Ss in Cluster 
2 maintain a higher conductance level until 
the 70 min. point. After this time three of 
the ascending functions exceed one or more 
of the descending functions. This apparent 
converging of those conductance levels which 
did change is shown in the decreased range of 
conductance readings for these clusters, from 
126.7 at 20 min. to 58.7 after 120 min. Clus- 
ter 3 consists of two Ss whose conductance 
trend is represented by straight lines. A finer 
time analysis revealed these as approximately 
cyclical. 

The average numbers of errors of omission 
and commission within each cluster were as 
follows: Cluster 1, 8.0; Cluster 2, 5.7; and 
Cluster 3, 7.5. These results suggest that 
high conductance may be associated with bet- 
ter performance. To examine this relation- 
ship further, two groups were formed for each 
half-hour period according to the conduct- 
ance level of the Ss at the end of each period. 
The “upper” group included the four Ss with 
the four higher conductance readings; the 
“lower” group included four Ss with the four 
lower readings. The S with the median con- 
ductance level for each period was not in- 
cluded. For any one period, the groups could 
be composed of different Ss. Table 2 com- 
pares the performance of these groups. The 
differences between total errors were in the 
expected direction, but were not statistically 
significant. 

The total of 15 errors of commission had 
the following distribution for the half-hour 


Table 2 


Comparison of the Performance of Groups Based on the 
Four Higher and Four Lower Conductance Levels 
at the End of Each Half-Hour Period 
Errors include omission and commission 


Half-Hour Periods 
Group 1 2 3 4 Total 
Upper 2 3 6 7 20 
Lower 12 1 8 15 36 





time periods: 11, 1, 1, and 2. This agrees 
with a previous observation (Adams, 1956) 
that most false reports occur early in the ses- 
sion, but with time the irrelevant stimuli are 
better discriminated. Ss 1, 2, 4, and 6 who 
were responsible for the 11 errors in the first 
session had conductance readings of 60 mi- 
croamperes or less during the first half-hour. 
The Ss with the four higher conductance lev- 
els during that period had no errors of com- 
mission. This result does not support the 
view that the more excited Ss are prone to 
react to irrelevant stimuli. 


Discussion 


The consistency of the individual conduct- 
ance trends show a regular and continuous 
readjustment process by the S to a prolonged 
and monotonous task. Six of the nine auto- 
correlations exceeded .80 which indicates that 
conductance change is related to monitoring 
time. Two of the remaining trends can be 
considered consistent since they described a 
cyclical pattern during the session. 

These individual trends can be classified 
into three types of reaction: (@) an increas- 
ing basal conductance which suggests that 
the S expends greater effort in order to re- 
main vigilant, (b) a decreasing basal con- 
ductance which can be interpreted as an in- 
ability by the S to maintain a high state 
of vigilance, (c) fluctuation of conductance 
around an average level suggesting continuous 
compensatory efforts by the S to maintain a 
given level of vigilance. To account for a 
pattern like that of S 7, allowance must be 
made for the degree to which an individual 
will tolerate the discomfort induced by main- 
taining high efficiency. 
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These generalizations assume that high 
conductance level is related to high vigilance. 
Although the data suggest this relationship, 
no statistically significant relationship was 
found. 

Broadbent (1953) has pointed out that in- 
terruptions during the session prevent the 
early rapid decline in performance. Such an 
effect may have occurred in the present study 
due to the conductance hook-up or irregular 
external auditory stimuli, but efforts were 
made to prevent any such interference. 

A GSR of a few milliamperes occurred for 
some Ss when the double jump was presented. 
The amplitude and frequency of the deflec- 
tion decreased as the session continued. In 
a few instances such deflections occurred 


when the Ss did not detect the double jump. 


Summary 


Conductance during a vigilance task and its 
relationship to performance was investigated. 
Apparatus and procedure were similar to that 
used by Mackworth in his “clock” test. 
men and three women students were 
as Ss. 

The conductance trends over the two-hour 
session formed three clusters: ascending in 
four Ss, descending in three Ss, and cyclical 
in two Ss. No significant differences were 
found between the performances of these 
three clusters nor between high and low con- 
ductance groups. 


Six 
used 


The results suggest, how- 
ever, that higher conductance level is asso- 
ciated with better performance. 

Eleven of the 15 errors of commission oc- 


curred during the first half-hour. None of 
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these were made by the Ss with the four 
higher conductance ievels. 


Received May 7, 1958. 
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VOCATIONAL INTERESTS OF NAVAL AVIATION 
CADETS: 


FINAL RESULTS '” 


ROBERT B. VOAS * 


U.S. Naval School of Aviation Medicine, Pensacola, Florida 


Motivation for flying is an important 
requisite for success in the flight program. 
Unfortunately, most of the young men who 
enter flight training have had little experi- 
ence with flying, and, therefore, there is little 
on which to evaluate their desire to fly. This 
study attempts to determine whether the in- 
terest patterns of students who complete 
training and students who fail differ in such 
a way as to permit the use of a vocational 
interest test as a selection device. 

The Kuder Preference Record: Vocational, 
Form. BM (KPR) (Kuder, 1946) is a stand- 
ard interest inventory which has been used in 
previous attempts to predict flight training 
success. Cerf (1947) reported that the in- 
ventory failed to predict success in Army Air 
Force training during World War II. How- 


ever, Rosenberg and Izard (1954) compared 
the scores of 137 naval aviation cadets who 
took the KPR after leaving the training pro- 
gram with 137 cadets who were still success- 
fully pursuing their training after nine months 


in the program. They found that the un- 
successful cadets had lower scores on the 
Mechanical and Scientific scales and higher 
scores on the Persuasive, Literary, and Mu- 
sical scales. The present study is a follow-up 
of the work of Rosenberg and Izard designed 
to determine the usefulness of KPR as a pre- 
dictor of success in the fight program when 
administered before training begins. 


1 The data upon which this report is based have 
been reported under the title: Inventory testing of 
vocational interests of naval aviation cadets: Final 
results. U.S. Naval School of Aviation Medicine 
Research Report No. NM 14 02 11.01, April 1957. 

“The opinions or assertions contained herein are 
the private ones of the writer and are not to be con- 
strued as official or reflecting the views of the Navy 
Department or the naval service at large. 

3 Now at the Naval Medical Research Institute, 
Bethesda, Maryland. 


Procedure and Results 


In addition to the two groups described 
above, Rosenberg and Izard (1954) adminis- 
tered the KPR to 16 classes of entering 
cadets. The test was administered as part 
of the check in procedure during the stu- 
dent’s first week in the training program: 
The records of 605 of these cadets were avail- 
able for the present study. Of the 605 ca- 
dets, 465 successfully completed training (S 
group); 74 withdrew from the program at 
their own request (W group); 34 failed in 
some portion of flight training (F group); 
and 32 were eliminated for medical or miscel- 
laneous reasons (M group). Besides com- 
paring these groups on the standard KPR 
scales, a special scale was constructed which 
would reflect the differences which Rosenberg 
and Izard found between successful and with- 
drawing cadets. The test papers of the 137 
students tested after leaving the program and 
the 137 successful cadets tested after nine 
months of training were analyzed. In those 
item triads which demonstrated differences in 
response significant’ at the P = .01 level or 
better between these criterion groups, a score 
of “1” was assigned to the alternative or al- 
ternatives most frequently marked either as 
most or least interesting by the cadets in the 
withdrawal group. A score of “0” was as- 
signed the responses most frequently marked 
by the successful cadets. In this way an 80- 
item voluntary withdrawal (VW) scale was 
constructed. A high score on the VW scale 
indicated that the individual had an interest 
pattern similar to that of cadets who volun- 
tarily withdraw from flight training, while a 
low score indicated interests similar to. suc- 
cessful cadets. The mean VW scale score for 
the 137 successful cadets was 25.70 with an 
SD of 12.07, while the mean score for the 
137 withdrawing cadets was 39.86 with an 
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Table 1 


Means on the Kuder Scales for Training Criterion Groups by Time of Testing 


Pretest Results 

Cadets Cadets All 

Successful Who Who Other 

Cadets Withdrew Failed Attritions Attrition 
(S) Group (W) Group (F) Group (M)Group Group 
N =465 N =74 N=34 N =32 N =140 

Kuder Scales A M j M M 


Concurrent Test Results* 


Corre- 
lation 
with the 


Total Cadets 
Successful Who 
Cadets Withdrew 
f 3 


N =137 N =137 N =605 


77.68» 
31.69 
69.15 
70.78 


76.91 
33.13 
69.53 
72.00 


. Mechanical 78.20¢ 
. Computationel 
3. Scientific 
. Persuasive 


51.58 
40.91 
19.57 
69.18 
39.78 


. Artistic 

. Literary 

. Musical 

. Social Service 

. Clerical 

. Voluntary With 


drawal Scale 32.36" 


® See reference: Rosenberg & Izard, 1954. 


» Significantly different from S group at the P < .05 level. 
¢ Significantly different from S group at the P < .01 level. 


4 Significantly different from W group at the P < .01 level. 


er significant at the P < .01 level. 


SD of 16.10. This difference produces a bi- 
serial correlation of .56 between the VW scale 
and the successful-withdrawal criteria for the 
original standardization group. This figure 
includes chance differences so that shrinkage 
would be expected on cross validation. 
Means on the nine standard scales and the 
VW scale were computed for each of the four 
(S, W, F, and M) criterion groups. These 
data, together with the scores for the two 
groups studied by Rosenberg and Izard are 
presented in Table 1. To determine the re- 
lationship of these interests to the aptitudes 
which are important to success in flight train- 
ing, correlations were computed between the 
Kuder scales and the tests of the Naval Avia- 


Table 2 


Results of an Analysis of Covariance on the VW Scores 
for the Successful and Total Attrition Groups, 
with the Effect of the MCT Held Constant 


Sum of 
Squares of 
Errors of 
Source of Variation Estimate 


Total 101,749.42 


Adjusted 
Between Groups 
Within Groups 


579.27 
101,170.15 


579.27 


602 168.06 


F = 3.45,P > .05 


40.97 


32.22¢ 39.864 


tion Selection Battery. The largest relation- 
ships were found with the Mechanical Com- 
prehension test, a measure of the ability to 
visualize mechanical relationships. The cor- 
relations for this test are also included in 
Table 1. Since the voluntary withdrawal 
scale was significantly related to mechanical 
ability, an analysis of covariance on the VW 
scale scores for the total attrition versus suc- 
cessful group was carried out. The results 
of this analysis appear in Table 2. 


Discussion 


The first question to be considered was the 
extent to which the differences between the 
S and W groups on the standard KPR scales 
corroborated the findings of Rosenberg and 
Izard. They found that the voluntary with- 
drawals were significantly lower on the Me- 
chanical and Scientific scales. In the pres- 
ent study the withdrawal group also demon- 
strated lower mean Mechanical and Scientific 
scores; however, the difference for the Scien- 
tific scale was not statistically significant. In 
the earlier study the withdrawals obtained 
higher scores on the Persuasive, Literary, and 
Musical scales. In the present instance none 
of these scales demonstrated statistically sig- 
nificant differences between the W and S 
groups. Thus, only one of the five differ- 
ences reported by Rosenberg and Izard is 
confirmed by the present data. 
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A second method of comparing the results 
from the present group with those of Rosen- 
berg and Izard was in terms of the VW score. 
The cadets who voluntarily withdrew from 
the training program did demonstrate a sig- 
nificantly higher VW score than the success- 
ful cadets. Thus, this score does have some 
validity for prediction of success in the train- 
ing program. However, the difference be- 
tween the S and W groups produces a biserial 
correlation of only .17 compared to the .56 
correlation found for the original standardi- 
zation group. 

The differences in vocational interest scores 
between successful and withdrawing cadets 
are larger at the time of separation from the 
program than if measured at the beginning 
of training. Comparison of the S group with 
the successful cadets measured after nine 
months in the program indicates that the 
pretested group had significantly lower Me- 
chanical and higher Scientific, Literary, and 
Musical scores. For the withdrawal group 


the pretested cadets gave lower Persuasive 
and Clerical and higher Mechanical scores. 
Whether these differences are due to varia- 
tions in the set under which the question- 


naire was taken or are due to changes in the 
underlying interest themselves cannot be de- 
termined. However, since the interest pat- 
terns as measured by the KPR change, va- 
lidity at the time of withdrawal gives little 
indication of predictive validity. 

An important consideration in the use of 
the KPR as a predictive measure is its rela- 
tionship to ability factors. The validity of 
this test is not limited to voluntary with- 
drawals. The F and M attrition groups also 
demonstrate small differences from the S 
group on one or more of the Kuder scales. 
In some cases these differences are numeri- 
cally larger than those between the S and W 
groups but since the number of cases in these 
categories is small, the differences are gener- 
ally not statistically significant. However, 
advantage can be taken of the general simi- 
larity in the pattern of differences among all 
the attrition groups by pooling them. The 
total attrition group has significantly lower 
Mechanical and Computational scores and 
significantly higher Literary scores than does 
the successful group. In addition, the VW 
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scores for this pooled group were significantly 
higher than for the successful cadets. Thus, 
the VW scale constructed to detect voluntary 
withdrawals appears to be equally effective in 
detecting cadets who will fail for other rea- 
sons such as lack of ability. This finding 
suggests that the KPR interest scores are re- 
lated to the special abilities which are re- 
quired for success in naval aviation training. 

The data for the MCT given in Table 1 
indicate that this ability measure correlates 
significantly with seven of the 10 interest 
scales. Of particular importance is the .30 
correlation with the specially built VW scale. 
Since the MCT is the most valid single meas- 
ure of aptitude for flight training, the va- 
lidity of the VW scale may be based on its 
relationship to mechanical ability rather than 
on the direct effect of the interest pattern it- 
self. The results of the analysis of covari- 
ance which appear in Table 2 demonstrate 
that with the MCT held constant the differ- 
ence between the mean VW scores for the 
successful and total attrition groups was not 
statistically significant. Thus, the validity of 
the VW score appears to be based primarily 
on its relationship to mechanical ability. 

The failure of the KPR mechanical scale 
to predict success in U. S. Air Force flight 
training when tests of mechanical ability 
demonstrated high validity led Cerf (1947) 
to suggest that the relationship between me- 
chanical interests and mechanical ability is 
low. For the naval cadets sampled in this 
study the relationship appears to be greater 
and the interest tests demonstrate some va- 
lidity. --Essentially, however, the present 
study is in agreement with that of the Air 
Force in suggesting that vocational interest 
inventories of this type are not of great value 
for predicting training outcome. Where in- 
terest tests do demonstrate some validity, 
ability measures can probably be found which 
will cover the same variance and avoid the 
problem of faking which often invalidates 
these questionnaires. 


Summary 


This paper reports a study of the validity 
of the Kuder Preference Record as a pre- 
dictor of success in flight training. This in- 
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ventory was administered to 605 naval avia- 
tion cadets on entrance into flight training. 
Scores of the successful cadets were com- 
pared with cadets who withdrew or failed in 
the training program. ‘The KPR demon- 
strated small but statistically significant va- 
lidity for prediction of all categories of at- 
trition. However, when differences in me- 
chanical ability were controlled, this inventory 
did not show a significant relationship to the 
pass-fail criterion. It was concluded, there- 
fore, that the vocational interests measured 
by this inventory do not have an important 
relationship to success in flight training ex- 
cept as they reflect the presence or absence 


of the special mechanical skills required in 
flying. 


Received May 9, 1958. 
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One of the most important goals in maga- 
zine readership research is the development of 
valid and reliable methods of measuring indi- 
viduals with respect to whatever psychologi- 
cal attributes we might reasonably assume to 
be related to the reading of magazines. This 
goal is based on the assumption that the men- 
tal and emotional activity involved is not 
haphazard nor random, but has measurable 
causes within the individual personality. 

Studies which related demographic data to 
readership represented the first attempts by 
mass communications researchers to find func- 
tional variables of magazine readership. This 
approach was found to be of limited useful- 
ness as the analysis of readership data pro- 
gressed from simple studies of item popularity 
to attempts to understand unique reading pat- 
terns. The search for more basic explanations 
led to the attempt to measure personality fac- 
tors, which, if they could be found to be dis- 
criminators of individuals and groups of indi- 
viduals with respect to the items they choose 
to read, could lay the foundation for more 
adequate explanations of magazine reading 
behavior. 

The Allport-Vernon Study of Values was 
selected as the experimental measurement for 
this study for the following reasons: (a) Its 
orientation tended to be similar to attitude 
and opinion studies commonly made of gen- 
eral populations; (6) It was not a clinical 
measurement; its approach to personality 
measurement was in the area of normal per- 
sonality; (c) Evaluative judgments, as used 
in the Study of Values, appeared to be closely 
related to the type of behavior involved in 
magazine reading. 

Some historical justification for expecting 
that the Study of Values would be effective 
in relating magazine reading to values did 


1 The authors are indebted to Herbert C. Ludeke, 
Manager Development Division, Research Depart- 
ment, The Curtis Publishing Company, for his guid- 
ance and support throughout this project. 


exist. A. G. Woolbert showed that it was an 
effective predictor of recall of experimental 
newspaper items (Cantril & Allport, 1933, 
p. 265). Other early studies, reported by 
Cantril and Allport in 1933, demonstrated 
that general evaluative attitudes influence the 
activities of everyday life. 

The investigators were, of course, aware of 
the conflicting opinions about the Study of 
Values. The hypothesis was made, however, 
that this measurement would validly and re- 
liably differentiate groups of individuals with 
respect to the six values of the test, and that 
these values would be related to magazine 
reading. 


Revising the Study of Values 


In the form in which it is presently pub- 
lished, the Study of Values was found to be 
considerably above the vocabulary level of 
noncollege general populations found in na- 
tional studies of magazine reading. In addi- 
tion, certain items appeared to be too spe- 
cialized or of too limited interest to general 
populations, as they assumed a cultural level 
far above average. For the purposes of our 
study, a revision was clearly necessary. 

The Curtis revision of the Study of Values 
attempted to reduce the vocabulary and cul- 
tural levels to that of readers of mass circula- 
tion magazines, while at the same time, to do 
as little violence as possible to the underlying 
design and wording of the original test. The 
investigators were at all times mindful of the 
necessity of modernizing and simplifying the 
language of the test without substantially 
changing its design or thought. It was also 
found necessary to simplify the original scor- 
ing and marking system, and to develop a 
new type of score sheet for office use. All the 
revisions were made empirically through the 
combined experience of the investigators in 
preparing, testing and analyzing survey ques- 
tionnaires used with national samples of 
magazine readers. 
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The following example will demonstrate the 
level of the revision: 

Item 20, Part I: Original Version (Allport, 
Vernon, & Lindzey, 1951a) 

Which of the following would you consider the 
more important function of education? (a) its prepa- 
ration for practical achievement and financial re- 
ward; (b) its preparation for participation in com- 
munity activities and aiding less fortunate persons. 


Revised Version 


The aim of schools should be: (a) to prepare stu- 
dents to get good jobs; (b) to prepare students for 
community activities and helping others. 

The Curtis revision of the Study of Values 
has been field tested in a variety of situa- 
tions. Following extensive pretesting within 
the Curtis Publishing Company, the revision 
was administered as a house-to-house survey 
interview with noncollege housewives in a 
middle-class suburban Philadelphia neighbor- 
hood in order to test the feasibility of the re- 
vision in a typical field interviewing situation. 
Results of this field test indicated that the re- 
vision was usable under these conditions. 

Its most rigorous field test to date was 
made when it was administered to 300 Ss in 
a six-city study to test the validity of the re- 
vision. Five methods of administering the 
Study of Values were used; the test was con- 
ducted both with individuals and with groups, 
in both office and home situations. 

These field testing procedures demonstrated 
that the Curtis revision of the Study of Values 
is a practical instrument under field condi- 
tions, in that all five methods were successful 
in gaining respondent cooperation and in es- 
tablishing the understandability of the test 
items, the scoring and the instructions. The 
success of the personal interview method, in 
particular, indicates that the revision can be 
used on a national sample basis with maga- 
zine reader audiences. 


Establishing the Reliability of the Revision 

Reliability tests were conducted on two 
groups of Ss: the Junior Class of the Radnor 
Senior High School of Wayne, Pennsylvania, 
and a group of new industrial employees of 
The Curtis Publishing Company. Both groups 
met the educational and cultural specifications 
of the study: no S had received more than 
a high school education, although the high 


school group contained many who planned to 
go on to college. The handicap imposed on 
the findings through the use of these groups 
was, of course, recognized at the time of their 
selection. The groups comprised individuals 
different in many ways from groups of mass 
circulation magazine readers found in national 
studies. If, however, the revision of the 
Study of Values proved to have acceptable 
reliability for the two experimental groups, 
it might be argued that the revision would 
be operating under less difficult conditions 
among adult readers of national magazines, 
and could be considered reliable for work 
with reader audiences. 

Both administrations to the high school 
group were made by the investigators them- 
selves, with a time lapse of four months be- 
tween the two tests. The time lapse for the 
industrial group was one month between the 
two tests, with the administration conducted 
by members of the Personnel Department of 
The Curtis Publishing Company. Table 1 
shows the test-retest correlations for the two 
experimental groups. 

The repeat reliability coefficients shown in 
Table 1 were somewhat lower than those re- 
ported by Allport, Vernon, and Lindzey for 
their 1951 revision of the Study of Values. 
A longer time period, however, was involved 
for the high school group in this current 
study than for the groups tested in the 1951 
revision study. The mean repeat reliability 
coefficients were .83 for the high school group 


Table 1 


Test-Retest Product-Moment Reliability Coefficients 
for the Curtis Revision of the Study of Values 


Test-Retest Reliability 
Coefficient 
Student Industrial 
Group* Group» 
Value (N = 77) (N = 58) 
Theoretical 81 
Economic 85 
Aesthetic 
Social 
Political 
Religious 


* Time lapse between tests: 


» Time lapse between tests 


four months 
one month. 
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Table 2 
Split-Half Reliability Coefficients for the Curtis 
Revision of the Study of Values 


Split-Half Reliability 
Coefficient 
Student Industrial 
Group Group 
(N = 77) (N = 58) 


Value 


Theoretical 85 57 
Economic 73 51 
Aesthetic 70 .66 
Social 48 58 
Political 54 47 
Religious 86 63 


Note.—Reliability coefficients of whole test, calculated by 
applying the Spearman-Brown prophecy formula to the corre 
lation between split halves. 


and .78 for the Curtis industrial employees, 
using a z transformation. These mean co- 
efficients may be compared with the mean 
of .89 for the Allport, Vernon, and Lindzey 
(1951b) revision. 

The student group, as shown in Table 1, 
had higher test-retest reliability than the in- 
dustrial employees, despite the fact that the 
time interval between the two administra- 
tions of the revision was four times longer for 
the student group. These higher reliability 
coefficients are all the more noteworthy since 
nonadult, immature Ss would be expected to 
show less stability on the Allport-Vernon 
values than the adult, nonstudent group of 
Curtis employees. 

Split-half reliability was tested by divid- 
ing the revision into two subscales, the sub- 
scales being composed in such a manner that 
there was approximately the same number 
of pairings between the value under study and 
all remaining values. Table 2 shows the split- 
half reliability coefficients for the two experi- 
mental groups. 

The mean reliability coefficient, using a z 
transformation, was .72 for the group of 
Radnor High School students and .57 for the 
group of industrial employees. This com- 
pares with a mean reliability coefficient of 
.82 for the Allport, Vernon, and Lindzey 
(1951b) revision of the Study of Values. 
Split-half reliability coefficients were lower 
than the repeat reliability coefficients, as was 
also true for the Allport, Vernon, and Lindzey 


(1951b) revision. One possible reason for 
the lower split-half reliability coefficients is 
that the Study of Values does not contain 
simple items dealing with only one value; 
pairings of values take place in every item. 
Random methods of selecting items for 
equivalent forms of the test cannot be ap- 
plied; the investigators had to approximate 
the pairings of values for the items selected 
for split-half reliability testing. 

The investigators were satisfied that the 
results of these tests of reliability demon- 
strated that the Curtis revision was suffi- 
ciently reliable to warrant further work with 
the test, even though it might be argued that 
the reliability figures would have been some- 
what higher if reliability coefficients had been 
established for groups of adults more similar 
to those found in national populations of 
magazine readers. 

As far as the discriminative power of the 
items in the revision was concerned, 96 out 
of the 120 choices in the test successfully dis- 
tinguished between Ss whose score indicated 
that they ranked high on a test value and 
those ranking low on the value. On the other 
hand, 24 out of the 120 choices did not dis- 
tinguish between highs and lows. These 24, 
however, were evenly distributed among the 
six values. Analyzing them further, it was 
found that only one item was totally worth- 
less, in that neither choice was diagnostic. In 
general, the 24 failures were accepted by the 
investigators as a limitation of the Curtis re- 
vision which did not materially impair its 
value with respect to the purposes for which 
it was designed. 


Testing the Validity of the Revision 


The hypothesis that the Curtis revision of 
the Study of Values has validity with respect 
to the relationship between the six values as 
measured by the revision and expressed inter- 
est in reading magazine stories and articles 
was tested by means of a study conducted on 
300 Ss (150 men and 150 women) in six 
Eastern cities: New York, Trenton, Allen- 
town, Providence, Columbus, and Cincinnati. 

The Curtis revision was administered to 
each S, along with a questionnaire on reading 
interests containing a list of titles of 35 maga- 
zine nonfiction articles and 33 magazine-type 
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short stories. The titles were designed to 
cover 10 nonfiction and 11 fiction topics, with 
various themes and appeals within each topic. 
Interest in reading each of these items was 
registered by means of a thermometer scaling 
device, with high and low temperatures indi- 
cating high and low interest in reading the 
items. This thermometer scale had previ- 
ously been validated as a predictor of reading 
interest and other behavior. (At the same 
time, attitudes toward several national maga- 
zines were studied by means of semantic dif- 
ferential tests, the results of which are be- 
yond the scope of the present report.) In- 
terviewing for this study was conducted by 
the Alan C. Russell Marketing Research or- 
ganization. 

Two questions to be answered by this study 
were, first, whether or not individuals scoring 
high on a given value differed significantly 
from those not scoring high on the value with 
respect to interest shown. in story and article 
titles; and second, if there were differences, 
how well did reading interests correspond to 
the values as measured by the test? For the 
purposes of this analysis, individuals were 
considered “high” whose score for a value 
was plus one standard deviation from the 
mean for the value; individuals were consid- 
ered to have shown positive interest in a title 
if they rated it 80 degrees or higher on the 
thermometer scale. Using these definitions, 
differences in interest in each story or article 
title for individuals high in each value were 
tested for significance. 

Results indicated that 29 of the 33 fiction 
titles and 32 of the 35 nonfiction titles were 
significantly different in interest among per- 
sons scoring high in the six values. Differ- 
ences significant at the 5 per cent level of 
confidence or better were then examined 
against the characterization of the six value 
types as described by Allport and Vernon in 
the Manual of Directions for the Study of 
Values, with the following results. 

The dominant interest of the “theoretical” 
person, according to the authors of the Study 
of Values, is the discovery of truth; his inter- 
ests are described as empirical, critical, and 
rational; he is seen as an intellectualist, with 
interests in science (1951b). In the current 
study, those persons who scored high in the 


theoretical value indicated a significantly 
higher interest in reading two of the three 
articles on science. For the third article, 
dealing with science in relation to human 
happiness rather than with “pure” science, 
they were not significantly higher. They also 
showed higher interest than others in science 
fiction. On the other hand they were signifi- 
cantly lower than other people in interest in 
reading the articles on domestic arts and on 
religion. They showed lower interest also in 
fiction themes dealing with romantic or senti- 
mental aspects of love, home, and children. 
In all, the high theoretical people in this 
study showed significantly different interest 
in reading 14 of the 33 stories and 15 of the 
35 articles. 

The “economic’”’ person as described in the 
Study of Values Manual of Directions is in- 
terested in the utilitarian, the tangible, the 
practical, and “conforms well to the prevail- 
ing stereotype of the average American busi- 
ness man.” In the present study, those per- 
sons scoring high in the economic value indi- 
cated significantly higher interest than the 
noneconomic people in all sports articles, and 
in fiction dealing with sports and the West. 
They were significantly less interested than 
others in articles dealing with the theoretical 
aspects of a topic, as opposed to the practical, 
“how-to” aspects, whether the topic was medi- 
cine, religion, science, or entertainment. In 
all, the high economic people showed signifi- 
cantly different interest in reading three of 
the 33 stories and 12 of the 35 articles. 

The “aesthetic” person, as defined by the 
authors of the Study of Values, is one who 
has a dominant interest in beauty, harmony, 
symmetry; he need not necessarily be crea- 
tive, but is highly interested in the artistic. 
Those scoring high in the aesthetic value in 
the present test indicated significantly higher 
interest than nonaesthetic persons in the two 
articles dealing with aspects of American cul- 
ture. They showed significantly less interest 
in reading the magazine-type fiction items. in- 
cluded in this study: of the 33 fiction titles, 
high aesthetic scorers were lower in interest 
in 23. They also showed less interest in read- 
ing the nonfiction items than did nonaesthetic 
people: of the 35 nonfiction titles, they indi- 
cated significantly lower interest in 17. 
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The “social” person, as described by the 
test authors in the instruction manual, is one 
whose highest value is altruistic or philan- 
thropic love of people, and who therefore 
tends to be sympathetic and unselfish. In 
the present study, those scoring high in the 
social value showed significantly higher in- 
terest in articles whose theme indicated em- 
phasis on help or service to mankind. They 
also showed significantly higher interest in 
articles that centered on the home and family. 
In fiction their preferences ran to themes of 
human relationships: romance, home, and 
children, as well as stories of personal rela- 
tionships against medical or business settings. 
They were significantly less interested than 
other persons in sports, whether fiction or 
nonfiction. In all, high social scorers dif- 
fered significantly from others on 17 fiction 
and 12 nonfiction titles. 

The “political” person, according to the 
test authors, is interested primarily in power, 
in all competition and struggle, not exclu- 
sively in politics. In the present study, those 
with high political scores expressed higher 
interest than others in reading articles on 
politics and on crime that dealt with con- 
flict. They were also significantly higher in 
interest in competitive sports in both fiction 
and nonfiction. In addition they expressed 
preferences for war fiction. They were sig- 
nificantly lower than nonpolitical people in 
interest in articles on religion and on do- 
mestic articles of the home-service variety. 
In all, they differed significantly from others 
on four fiction and 11 nonfiction titles. 

The “religious” person, as described by 
Allport and Vernon (1951b), is mystical, and 
has as his highest value unity, or the relating 
of himself to the cosmos as a whole. Those 
scoring high in the religious value in the pres- 
ent study showed significantly higher interest 
in all the religious articles included in the 
test. They were also higher than others in 
interest in articles dealing with some aspect 
of charity, and with family-service topics. 
In fiction they were significantly higher in in- 
terest for those stories of human relationships 
that seemed to be concerned with human 
problems. In all, high religious scorers dif- 
fered significantly from others on five fiction 
and 14 nonfiction titles. 


In summary, the data showed many areas 
where there were plausible relationships be- 
tween the values as defined by the Allport- 
Vernon Study of Values and interest in read- 
ing the test items. The value scores were re- 
markably effective in discriminating among 
the types of stories and articles of interest to 
the various value groups. It should be noted 
that this study of validity was set up as a 
pilot study prior to a proposed full-scale na- 
tional sample survey which would be the 
necessary and final step in establishing the 
validity of the test with respect to the rela- 
tionship between the values tested and inter- 
est in reading magazines. From the evidence 
already on hand, however, it seems reason- 
able to assume that the Curtis revision of the 
Study of Values may be of considerable value 
to readership researchers in their attempts to 
understand reading behavior as it pertains to 
magazines. 

Summary 


One goal of magazine readership research 
is to develop measurements of psychological 
attributes related to readership of magazines. 
The Allport-Vernon Study of Values was 
chosen for study, and was revised for use 
with national samples of noncollege, general 
populations, under the conditions of house-to- 
house field survey interviewing. The revision 
was found to fulfill at least the minimum re- 
quirements of reliability. A pilot study made 
on 300 Ss tested the hypothesis that the 
Study of Values has validity with respect to 
interest in reading magazine fiction and non- 
fiction. Significant differences were found in 
reading interest for individuals of different 
values, and the value scores were effective in 
discriminating among the types of material 
chosen for reading by different groups of in- 
dividuals. 
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